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Abstract 

We present a new pathwise approximation scheme for stochastic differential equa¬ 
tions driven by multidimensional Brownian motion which does not require the simula¬ 
tion of Levy area and has a Wasserstein convergence rate better than the Euler scheme’s 
strong error rate of 0(y/h), where h is the step-size. By using rough path theory we 
avoid imposing any non-degenerate Hormander or ellipticity assumptions on the vector 
fields of the SDE, in contrast to the similar papers of Alfonsi et al M, Davie [23i 124] , 
and Malliavin et al [22]. The scheme is based on the log-ODE method with the Levy 
area increments replaced by Gaussian approximations with the same covariance struc¬ 
ture. The Wasserstein coupling is achieved by making small changes to the argument 
of Davie in [23] , the latter being an extension of the Komlos-Major-Tusnady Theorem. 

We prove that the convergence of the scheme in the Wasserstein metric is of the order 
0(h 1-2 / 7_e ) when the vector fields are y-Lipschitz in the sense of Stein. 

Keywords. Pathwise approximation of SDEs, Wasserstein couplings, Komlos-Major-Tusnady 
Theorem, log-ODE method, rough path theory, Ito map. 
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1 Introduction 

The problem of constructing pathwise approximations of solutions to stochastic differential 
equations (SDEs) driven by d-dimensional Brownian motion is difficult if a strong approxima¬ 
tion error of order greater than | is desired. This is because one must have the ability to 
simulate iterated integrals of Brownian motion HHE2], which is hard when d > 2. Efficient 
algorithms for generating double integrals, that is Levy area increments, do exist for d = 2 
(see mu Ea Eg), but the general case of d > 2 is still an open problem. With this obstacle 
in mind, the papers [U El El ED ED ED ED ESI ESI ESI Hi] and [ID II.9], among others, have 
studied SDE approximation schemes which do not require Levy area increments, but achieve 
an order of convergence greater than Instead of measuring the success of the scheme in 
the standard Z/ 2 -norm, these papers use the Wasserstein metric from optimal transport theory 
[82]. In particular, one constructs a probabilistic coupling between the SDE solution and an 
approximation scheme such that the error is measured in the Wasserstein metric, (using some 
particular cost function on Wiener space). 
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To set up notation, let W = (W \,..., Wf) denote a standard d-dimensional Brownian 
motion. In this paper we consider the pathwise approximation of the Stratonovich SDE 

d 

dx t = V (x t ) o dW(t ) + V 0 (x t ) dt := ^ V k (x t ) o dW k {t) + V 0 (x t ) dt, t e [0,1], (1.1) 

k=\ 

where Xo G K.' 3 , Vq is 1-Lipschitz and the vector field collection V = {V k } d k=x are 7-Lipschitz in 
the sense of Stein [77, Chapter XI], (denoted by V 0 G Lip^K 9 ) and V G Lip 7 (M 9 )). We assume 
that 7 > 2 so that there exists a unique solution to (11.11) almost surely (see [3EJ Theorem 17.3]). 

1.1 Pathwise approximation scheme 

We now introduce our new approximation scheme for (II.ip . Divide the unit interval [0,1] into 
N pieces of length h = N _1 . Let us adopt classical ODE flow notation by setting e F (y 0 ) = 
exp(F)(y 0 ) to be the value of the solution to the following ODE at time t — 1: 

Vt = Vo + [ F (y a ) ds, t G [0,1], 

Jo 

for some suitably regular vector field F : M 9 —> M 9 . Define the independent normal random 
variables 

W®~N(0,hI d ), z® ~ N (0, 12- I hl d ) , = (X - N(o, 12~ 1 /t 2 Jd(d-i)), (1.2) 

where j — 0,1,..., N — 1, and for 1 < k < l < d set 

B k i ■= - z[ i] wl j) + A^f. (1.3) 

Our scheme is defined iteratively; x^ = then for j — 0,1,..., N — 1: 

x‘ +I := exp Uy„ + £ HfV* + W V!] ) (x‘) . (1.4) 

\ k =1 1 <k<l<d / 

We can think of the sequence C (M 9 ) xAr as taking values in the Euclidean space M 97V , 

which we equip with the metric p(x,y ) = maxj =1 .. )JV ||^j — Vj || Rq - Given Borel measures yUi,/x 2 
on W N , let -M(yUi, /x 2 ) denote the set of measurable maps T : R qN —> R qN such that the 
pushforward measure satisfies v k*( y u 1 ) = /i 2 . The Wasserstein metric is dehned as 

W^/W/ha) ( inf f p (x, ^(x)) 2 Hi(dx)] 

The set A4(pi, /i 2 ) is called the set of couplings of p\ and /i 2 . An equivalent definition is given by 
W 2 (/^i , P 2 ) — inf E (p(X, y) 2 ) 1//2 , where the infim u m is taken over all joint distributions of the 
random variables A" and Y on M. qN with marginals ji\ and /i 2 respectively. The metric originates 
from the Monge-Kantorovich mass transportation problem, first introduced by Monge in 1781 
[65]. and then rediscovered many times in many forms since by L.V. Kantorovich [45], P. Levy, 
L.N. Wasserstein [ST], among others. For more details we refer to [62, E2] and §12 of [24]. 

We now state the main result of the paper. 

Theorem 1.1. Fix h > 0 and x 0 G M 9 . Let p denote the law of {xjhfjLx on M. qN , where 
x is the solution to hl.l\) started at x 0 , and let v denote the measure given by the law of the 
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approximation x h = {xh}B =l on M ?7V . If V 0 G Lip 1 (M 9 ) and V = {Vi,..., V 4 } G Lip 7 (M g ) for 
7 > 2 , i/jen f/iere exists a constant C = C'(||V|| Li 7 , 7 ) such that 

W 2 (/i,z/) < C/i 1 " 2 / 7 " 6 

for all e > 0. That is, we can find independent normal random variables as in (El^. defined 
on the same probability space as the Brownian motion W driving M.l\) . such that 

f \ 1/2 

E Uwv X ^J ~ Chl ~ 2h ~ £ - 

Thus if the vector fields V of the SDE are sufficiently regular such that V G Lip 7 for some 
7 > 4, then our scheme performs better than the 0(Vh) strong error rate of the traditional 
Euler scheme (see [63j for Maruyama’s original proof of the Euler scheme’s convergence rate). 
In the case of polynomial vector fields, we have a Wasserstein rate of 0(h }~ £ ) for any e > 0 by 
setting 7 > 2 . 

Remark 1.2. We can also consider other L p (rather than L 2 ) versions of the Wasserstein metric 
for p > 1 . Of particular note is the metric for p = 1 : 


Wi(/i, u) 


inf 


p (x, v L(a;)) p(dx). 


Certainly Wi (/i, u) < W 2 (/r, u). An elegant feature of this particular Wasserstein metric is its 
primal representation via functionals using the Kantorovitch-Rubinstein duality formula f| 82l 
Theorem 5.10 and Remark 5.16]). In particular, 


Wi(/i, u) 


sup 

V>sc(r' jJV ,r) 

Lip(y)<l 


E (« ({■''jj/ t])) - E (v> (i k )) 


(1.5) 


where Lip(^) := sup x ^ y denotes the Lipschitz constant of in the classical, not Stein, 

sense. 


There exist examples of smooth (in fact, polynomial) vector fields V = {I4 }^ =1 such that 
for some constant c > 0 the corresponding laws p, v satisfy: 


W 2 (yU, v) > Wi(/r, u) > ch\og(h x ). 


( 1 . 6 ) 


One example is the SDE defining Levy area (see Proposition 17.II) . So 0(—hlogh) is a general 
upper bound on the convergence rate of our scheme in the Wasserstein metric. Thus for poly¬ 
nomial (or more generally smooth) vector fields, our scheme achieves a Wasserstein convergence 
rate which is arbitrarily close (up to a logarithmic factor) to the best possible rate. 

Remark 1.3. Using the Wasserstein metric via the Kantorovich-Rubinstein duality formula is a 
quick way to establish weak approximation rates for approximation schemes. The disadvantage 
is that one is restricted to functionals if such that Lip(?/>) < 1, while the literature considers more 
general functionals, (even tempered distributions (40j). However, as a form of compensation, 
our scheme works for every SDE; it does not demand any cllipticity conditions on the vector 
fields, unlike many papers covering weak approximations including HI El HQ]. 

In the context of options pricing, we can interpret (1 1.5 [) as a measure of the performance of 
our scheme for weakly approximating the expectation of certain exotic Asian options (that is, 
functionals of the path at the times t G {jh}^ =0 ). This is in contrast to the weak approximation 
of vanilla European options, which are functionals of the terminal value of the path. As an aside, 
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note that if one actually wanted to approximate E(/(xi)) for some function / G (^(M 9 , M), 
then the algorithm presented by Ninomiya and Victoir in m does not require Levy area 
simulations either, but produces a much better weak approximation than our scheme. To be 
precise, they construct a sequential ODE-based scheme which, for a given step-size h > 0, 
outputs a point x satisfying: 

|E(/ (xf))- E(/(x,))|<C7i 2 . 

This is a whole order better than the best possible rate in general of (11.61) for exotic Asian 
options. We also comment (cf. §1]) that in the case of the standard Euler and Milstein 
schemes, the best order of convergence of the weak error for vanilla functionals is 0 (h) in general. 
Indeed, by the work of Talay and Tubaro ([80, Theorem 1]), this is the case for when V and Vo 
are non-zero smooth with bounded derivatives of all order and / G C ,00 (M 9 ,M) has polynomial 
growth together with its derivatives. We stress that our scheme is a pathwise approximation in 
that its output is meant to approximate an actual realization of the solution path, rather than 
the expectation of a given functional, (as in the case of Ninomiya and Victoir). 

In common with the algorithm of [67], our scheme is based on the level-2 version of 

the log-ODE method from rough path theory (see [nisi])- This latter approximation scheme 
also consists of solving a sequence of ODEs to produce a set of points In particular: 

x^ = x 0 and for j — 0,1,..., N — 1: 

x U+l) = exp(hV 0 + J2 W k )v k+ E (x®) . (1.7) 

\ k= 1 1 <k<l<d / 

This scheme requires the Levy area increments G [R d ,M d ]. For our new scheme {Xj}^ =1 we 
replace these increments with the Gaussian random variables B I ' J> defined above such that the 
covariance structure is the same. The theory of rough paths allows us to rewrite the original 
SDE (11.11) as the solution of the following rough differential equation (RDE) with drift: 

dx t = V (x t )) dW t + V 0 (x t ) dt, 

where W G C'([0,1], G ( - 2 ' ) (K a! )) is the standard enhanced Brownian rough path. It turns out 
that the schemes and can also be written as solutions of two RDEs with drift 

terms. To be precise; yjh = ad 9 ') an d Zjh = x (j ~> for all j, where y, z G C Y ([0,1], K 9 ) solve: 

dy t = V (; y t ) dWt + V 0 (y t ) dt, y 0 = x 0 , 
dz t = V (z t ) dXt + V 0 (z t ) dt, z 0 = x 0 . 

Here W h ,X h are members of a special class of 2-rough paths which we call piecewise abelian. 
The notion of piecewise abelian rough paths can be thought of as the natural non-commutative, 
(that is group-valued), analogue of piecewise linear approximations of paths with values in the 
abelian group fA 1 ) = M d . 

Both W h and X h share the same first level, which is the standard IV-step piecewise linear 
approximation W h of the Brownian motion W. So we consider W h and X h as two different rough 
path lifts of the same underlying path W h . Their difference at level 2 is given by the continuous 
interpolations of the two discrete random walks x h an d Q h composed of the increments 
and B ( ' J> respectively. 

By constructing a probabilistic coupling of these two random walks, (conditional on the 
underlying Brownian increments of W h ), we establish an automatic coupling of W h and X h in 
the space Gfl 2 (M d ) of geometric 2-rough paths. Using the Lipschitz-continuity of the Ito map 
5 of rough path theory in the inhomogeneous p -variation metric, this action induces a coupling 
of the RDE solutions y and z in C([0,1], M 9 ). The situation can be described with the following 
diagram (where dashed arrows represent couplings): 
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] x7V -> GD 2 (M d ) —> GD K (M d ) «-=-*• C ([0,1], 


The initial coupling of the random walks \ h and @ /l is constructed by using the dyadic 
coupling argument of Davie’s recent paper |23j. In fact all the coupling machinery is his; we only 
change the original vector to be coupled with a Gaussian vector and the rest of the proof remains 
the same. Davie’s coupling argument is based on a modern extension of the classical Kornlos- 
Major-Tusnady Theorem [T5], also known as the Hungarian Embedding Theorem. Previous 
papers using the KMT method for Wasserstein approximations of SDEs include [38, TlJ J25j, 
where the latter approximated SDEs driven by Levy processes. 


1.2 Previous research 

One benefit of using the technology of rough paths is that we can exploit the Lipschitz-continuity 
of the Ito map 

E:Gfi*(M d )->C([i0,1],M 9 ). 

In our case this allows us to perform the coupling of the SDE and our approximation scheme at 
the input-side of 5 rather than directly in the (classical) Wiener space C([0,1],M 9 ). Therefore 
our coupling argument is completely independent of the vector fields V of the original SDE fjl.ljl : 
the vector fields are only relevant once we push the coupling through the Ito map. Consequently, 
in contrast to the similar papers DQE1EIEHES1E3, our approach has the distinct advantage 
of not imposing any non-degenerate Hbrmander condition on the vector fields V = {V k } d =1 . 
The schemes of Malliavin et al [3, 22J and Davie in [24J demand that the Lie bracket collection 
satisfies: 

{[14, Vi\(x) : 1 <k<l<d,xG M 9 } spans M 9 , (1.8) 

while [23j requires the following less stringent version of the Hormander condition. 

Definition 1.4 (Davie condition). For each x G M 9 , define the linear mapping L x : M, d © 
[M rf , M d ] —» W by 

d 

L x (r, s) = ^ v k(' x )r k + ^ i V ki Vj](a;)sfci for r = (r k ) G R d and s = (s kt ) G [R d , R d }. 

k =1 1 <k<l<d 

Denote the ball of radius e > 0 centred at the origin in R d © [M d ,R d ] by B( 0,e). The Davie 
strengthened Hormander condition is defined as the existence of constants 6 > 0 and K > 0 
such that for all x G R q , B (0, h(l + |x|) _A ) C L x (5(0,1)). 

In other words, Davie assumes that 

{Vj(x), [14, Vi](x) : 1 < j, k, l < d, x G M 9 } spans R q uniformly in x. (1.9) 

Note that the former restriction (11.81) excludes the SDE defining Levy area and every non¬ 
trivial SDE in the simple case of q = d = 2. It also excludes the example of the SDE describing 
Brownian motion on the unit circle in M 2 ; in this case, (d, q) = (2,1) and the rank of the 
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associated Lie bracket is equal to 1 (cf. j2T, Remark 2]). On the other hand, both of these 
examples satisfy (ll.Op and so Davie’s scheme in [23] can be applied. 

Using a clever rotation of Brownian motion, Cruzeiro, Malliavin and Thalmaier [52] establish 
a Wasserstein metric convergence rate of 0(h) for a scheme based on the traditional Milstein 
approximation which does not require the simulation of Levy area. Davie also achieves this 
rate with a perturbation of the Milstein scheme, while Alfonsi, Jourdain and Kohatsu-Higa 
prove in [1] that in the case of d — 1 the traditional Euler scheme has a Wasserstein rate 
of convergence of 0(h 2 0~ e ) under the assumption of uniform ellipticity of V. Their proof 
critically relies upon the Lamperti transform which cannot be extended to the case of d > 1 
nor the non-elliptic case. Maintaining the ellipticity condition, the authors then extended their 
result to the multidimensional case using Malliavin calculus in [5]. To be precise, the latter 
paper improved the rate to 0(hy/— log h) but used a weaker form of the Wasserstein metric 
W 2 (which is defined in (11.lip below and is called a fixed-time approximation by Davie). These 
coupling results concerning the Euler scheme cannot be extended to all SDEs. Indeed, as we 
will discuss below, there exist non-elliptic SDEs for which any coupling of the Euler scheme 
with the true solution is at least 0(Vh ) apart in the Wasserstein metric ([24! Example 11.1]). 

Remark 1.5. Other papers exploiting the separation of the input and output of SDE flows 
provided by the Ito map of rough path theory include the e-strong SDE simulation paper 
[10] of Blanchet et al, the SDE quantization paper of [68], and by Riedel. The latter 
paper studied Gaussian rough paths and transportation-cost inequalities from optimal transport 
theory. Other examples include [34- 32] which respectively prove the Stroock-Varadhan support 
theorem and the large deviation Freidlin-Wentzall estimates. These papers make elegant use of 
the Ito map to reduce these non-trivial results to simpler statements about Brownian motion 
and Levy area in the rough path topology (see [361 Chapter IX]). 

We also mention that in [23] Davie presents an improvement of his previous approximations 
based on perturbing the Milstein scheme which achieves achieves a Wasserstein error of order 
0(h) without requiring Levy area simulation nor non-degeneracy assumptions on the vector 
fields. The proof follows the work of Kloeden, Platen and Wright 03 by employing a truncation 
of the Fourier series expansion of Levy area. 

1.3 Connections with original Davie scheme 

Let us discuss the work of [23] in more detail from the perspective of the present paper. In 
[23] Davie assumes there is no drift term in the original SDE (II. Ilk that is, Vo = 0. His scheme 
{x^}jL 0 is defined by x ^ = x 0 and 

d 

X ( J ' +1 ) = xW ^ W (J) Vk (fW) + ^2 B$ [ 14 , u] (x^) 

k= 1 1 <k<l<d 

+ 5 X (wpwp - hih) J2 ( v ?£r) h e) ) • ( L11) ) 

1 <k,l<d m =1 k m / 

In other words, it is the Milstein scheme ([63]) with the Levy area increments replaced with 
the Gaussian approximations defined in (11.31) . Denote the laws of and {xjh}f = i 

on W qN by A and fi respectively. Under the non-degeneracy condition of (11.91) . Davie establishes 
the Wasserstein-type bound ([231 Theorem 1]): 


WM/i, A) := inf max 



y{x)j 



< Ch. 


( 1 . 11 ) 
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Theorem o allows us to prove our own coupling result for Davie’s scheme (jl.lOjh which 
holds irrespectively of the rank of the SDE vector fields and their Lie brackets. 

Corollary 1.6. Let x be the solution of m in the case that Vo = 0. Fix h > 0 and denote 
the laws of and {xjh}f=i on R gN by A, /a respectively. IfV = {Vf,..., Vd} & Lip 7 (R' ? ) 

for 7 > 2, then there exists a constant C = C (11H11 Lip ~,, 7 ) such that for all e > 0 

W 2 (/ 1 , A) < Ch 1 - 2 /T , - e . 

Proof. The proof is quick. Since the drift is zero, the scheme {x^}^ =0 given by (jl.lOp is a 
discrete martingale with respect to the filtration Tj := a{(W^\ B^) : i < j}. Similarly, since 
Vo = 0, Fubini’s Theorem gives us 

E(x 7 1 |JF j )=E(exp(^t V^V t + £ B®[U,Vi])y‘)J =4 

\ \ k=l 1 <k<l<d J / 

Thus the difference process {x } - — x^}jL 0 is also a martingale with respect to the filtration 
(Jy)jV 0 . By considering stochastic and deterministic Taylor expansions (see [HI §3]), it can be 

shown that E(||x^ — x^ ||“ ? ) < Ch 2 for some constant C > 0. Indeed, the scheme (II , 10 p is 
the truncated version of the two-step Taylor expansion based numerical approximation of the 
ODE sequence (II .4j) . Alternatively, each vector field in the scheme 01.41) can be decomposed as 

X>Lu+ v dhu,v;] = x>yi4+ v 4 > [n,y+ 5] (sg>-4»)in. n■ 

k =1 1 <k<l<d k =1 1 <k<l<d 1 <k<l<d 

v -V-' V -V-' 

=:GO) =:Hd) 

We can then view the Davie scheme 01.101) as originating when we take the second order 
expansion of exp(G^) (precisely as in the Milstein scheme), and a first order approximation of 
the exp (H^) component, (the latter being composed of coefficients with scaling O(h) in L 2 ). 

To be precise: exp(G^ + H ( 0) & (jO) _|_ fjti)^ w here 

G«:=^wf ) u+ Yi Vl\v t ,v i] + l y ( w OV-s t ,h)Y v rfF 

k =1 1 <k<l<d l<k,l<d m= 1 771 

H^:= Y ( B h~V) [UVJ, 

1 <k<l<d 

and so G ^ + H <:,> gives (II . 10|) exactly. Thus the local L 2 -error of the scheme is 0(h 3 / 2 ), from 
which it follows that E(||x^ — < Ch 2 . 

Hence Doob’s maximal inequality yields 

E C.T7- * 0) lli.) £ 4E (IW - * w llL) s ce. 

Combining this inequality with Theorem 1 1.1 1 via the triangle inequality, we arrive at the claim. 

□ 

Thus in the case of a polynomial vector field system, we can prove that Davie’s scheme 
achieves a Wasserstein rate of 0(/V~ £ ) for any £ > 0 by setting 7 > 2 . While Theorem 11.11 and 
Corollary 1 1.61 do not achieve a convergence rate of O(h), they are in some sense an improvement 
on Davie’s bound ( 11 . 111 ) since we do not impose any Hormander restrictions on our vector fields 


7 


























and we use a stronger form of the Wasserstein metric. Certainly A) < W' 2 (/i, A), (which 

is analogous to the inequality: maXjE(||AA — Y\ || 2 ) < E(max; ||Aj — Yj|| 2 )). 

In the final remarks of |23| Davie conjectured that it might be possible to prove that 
W 2 (ji, A) < Ch for any SDE (that is, without assuming the non-degeneracy condition (II.9p ). 
In a sense, Corollary 11.61 is a partial answer to this conjecture; given a vector field collection of 
sufficient regularity in the Stein-Lipschitz sense, we can prove that the approximation schemes 
caii and (II. 10K converge with order O^h 1 e ) in our stronger Wasserstein metric. Moreover, 
this paper proves that the best possible rate in general is O(-hlogh), (see Corollary 17.21 and 
Remark 17.311 . This is established by adapting the counterexample given in [23] which origi¬ 
nally showed that W 2 (A,/i) > Ch log(/r _1 ), where A,/r correspond to the measures arising from 
the SDE defining the Levy area of Brownian motion. Note that since the vector fields are 
polynomial in this case, we can set 7 > - to achieve a rate of 0(h l ~ £ ) from Theorem 11.11 

One disadvantage of our approach is that we need to assume moderate regularity conditions 
on Lip 7 (E) with 7 > 4. In effect, we are trading the algebraic Hormander regularity of our 
vector fields in exchange for increased analytic Stein-Lipschitz conditions. 


1.4 Optimal Wasserstein rate for Euler scheme 

We now prove that in general the Wasserstein distance between the Euler scheme and the 
true solution is precisely of order 0(\/h). Alfonsi et al can establish the rate of 0(hy/— log h ) 
under the assumption of uniform ellipticity of the vector fields; thus we give the details of the 
non-elliptic counterexample previously sketched in [24; § 11 ]. 

Example 1.7. Consider the system with d = 1, q — 2 given by 

dx 1 = X 2 dW, dx 2 = —x\ dW, x(0) = (1, 0). 


In this example the vector field is about as about regular as possible without being non-trivial; 
certainly V(x ) = ( x 2 , — aq ) 4 is linear. Setting 5(f) := Xi(t ) 2 + x 2 (t) 2 , we find that 5 satisfies the 
deterministic differential equation dS = Sdt with 5(0) = 1, and so we must have 5(f) = e t . Note 
that 5(1) = e. The Euler scheme is given by Xi +1) = xf 1 +x^W^ and x^ +l) = x\P —x^W^. 
Writing 5^ := (. x ^) 2 + (x^) 2 , it follows that 5^ +1) = 1 + ( W and so 


n- 1 

5W = n i 1 + {W {j) ) 2 ) . 

3 =0 

We claim that there exists a universal constant C\ > 0 such that 

|e-5 (iV) | Ll >C x Vh. 


( 1 . 12 ) 


Indeed, using the inequality |e“ — e b \ = | e x dx | > |a — b\ for a, b > 0, and the fact that 
|log(l + x) — x\ = 0(x 2 ) for |x| < we have 


E (|e — 5 (iV) I) > E (|l — log (5 (iV) ) |) 


N 


i-Eiog ( 1 + hz J ) 


3 = 1 

N 


L 1 


3 =1 


i-E + 


L 1 


> 


N 


1 - h E z - 


3 =1 


- C 2 h, 


L 1 













where the {Zj }^ =1 are independent N(0, 1) random variables. Markov’s inequality and the 
Central Limit Theorem guarantee that 


ViVE(|e-S (iV) |) > ViV 


N 

i -hJ2 z 'i 

3 = 1 

/ ( N 


+ 0(Vh ) 


L 1 


>p|^/iv [hJ2z]-l] >1) + 0(Vh) <L(1) > 0, 


and (1 1.12 [) follows. The Cauchy-Schwarz inequality then implies that 


C\\fh < |e-5 (7V) | Ll = 


\x 


GO I 


w\ 


L 1 


<|^- 

= c 3 |^-IE' v) l 


|v^ + 


\x 


goi 


IL 2 


\L 2 


where C 3 > 0. As ||a;(l)|| 2 = S'(l) = e holds independently of the driving Brownian motion, 
this last line shows that the error of the Euler approximation cannot be less than 0(\/h) no 
matter what coupling is used. That is, there exists a constant C > 0 such that 


W 2 (c >W 2 (£«!)), £(*<">)) SCVft. 


On the other hand, since d — 1, Levy area is absent and hence the Milstein and (level 2) log- 
ODE schemes both give strong convergence errors of order 0(h). Moreover, the lack of Levy 
area means that the level 1 and 2 log-ODE methods coincide. 


1.5 Outline and notation of the paper 

The paper is organised as follows: the next section briefly summaries the necessary elements 
of rough path theory which is then followed by Section 3, where the original log-ODE method 
is introduced. We then define the class of piecewise abelian rough paths in Section 4 and show 
that the log-ODE method can be recast as the solution of a RDE driven by such a rough 
path W h . Section 5 examines the components of Levy area and defines our Gaussian piecewise 
abelian approximation XG The coupling of the random walks using Davie’s coupling result is 
established in the proceeding section. Having constructed this coupling, Section 7 then examines 
the induced coupling between W /l and XP Section 8 provides Wasserstein error estimates for 
the rough path lifts of the two piecewise abelian rough paths. These estimates are then put to 
use in Section 9 to prove Corollary 19. 11 from which Theorem 11.11 follows immediately. Section 10 
offers some concluding remarks regarding the feasibility of extensions of the main results to the 
case of fractional Brownian motion and higher order approximations. The paper is concluded 
with an appendix on the iterated Baker-Campbell-Hausdorff formula for random walks on Lie 
groups. 

Notation. Throughout the paper, C,c,... denote various deterministic constants (that may 
vary from line to line), which are independent of h or n, m, k. Constants which are dependent 
upon a variable will have the dependency explicitly stated. The usual bracket operation on W l 
is given by [x,y\ = x®y-y®x = J 2 i<k<i< d ( x kyi-xiy k )[e k , e{\, where [e k ,ei\ := e k ®ei~ei®e k G 
[M d ,M d ], and denotes the canonical basis of M d . 


2 Elements of rough path theory 

This section provides a quick tailored overview of relevant rough path theory and we take the 
opportunity to establish notation. Rough path analysis provides a method of constructing 
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solutions to differential equations driven by paths that are not of bounded variation but have 
controlled roughness. A measure of this roughness is given by the p-variation of the path (see 
( 12 .ip below). For a detailed overview of the theory we direct the reader to [3D], ESI EDI El, E2] 
Eli, 55] among a multitude of others. 


2.1 Algebraic preliminaries 

We introduce the necessary algebraic and geometric machinery in order to define rough path 
analysis. The foundation of the theory is given by the free tensor algebra and the free nilpotent 
Lie group embedded in it. Denote the space of continuous paths x : [0,1] — > R d by C ([0,1], R d ). 
Writing x S: t := Xt — x s for the increment, given p > 1 we define the p-variation norm of x by 

\ 1 /p 

^U,U+1 ||jjd I ' (2T) 

Let us denote by C p_var ([0,1], R d ) the linear subspace of C([0,1], R d ) consisting of paths of finite 
p-variation. In the case of x G C~J P " var ([0, l],R d ) for p G [1,2), the iterated integrals of x are 
canonically defined via Young integration [83]. The collection of all these integrals as an object 
in itself is called the signature of the path: 

OO « 

S(x)s t t := 1 + ^ / dx tl <£)...<£> dx tk , 

l J =1 J S<tl<...<tk<t 

where (s,t) G A[ 0 i i] := {(s, £) : 0 < s < t < 1}. Adopting the convention that (R 6 *)® 0 = R, we 
formally define the tensor algebras 

oo n 

T (o o) (R d ) : = 0(M d )® fc , T {n) (R d ) := 0(R d )® fc . 

k =0 k =0 


\X\ 


p- var; [0,1] 


sup 

D=(U) C[0,1] 


E 


We can see that the signature of x takes its values in Defining the canonical pro¬ 

jections 7 T n : T , (°°)(M d ) —y (R d )® n and 7r 0 , n : T(°°)(R d ) —y T^ n \R d ), we can also consider the 
truncated signature: 

n « n 

s n (x) 8it := 7T 0 , n (S(x) 8tt ) = 1 + W ® ® dx 4 e 0(® d ) 0fc = r (n) (R d ). 

k=l Js<t 1 <...<t k <t k=Q 

Thus we can view S n (-) as a continuous mapping from A[ 0) i] to (M d ). Given a coordinate 
ejj ® ® &i k G (M. d )® k , we dehne the corresponding projection of the signature via the dual 

space. For example, 

(e t * <g>... <g> e* k ,S(x) 3)t ) = [ dx\\ ... dx\ k k G R. 

We equip each (R d )® fc with the tensor algebra norm ||• |[defined by 


E l(< 


n 


^k 


,a>| 2 , 


l<ii, ...,ik<d 


and when no confusion arises we shall simply write ||a||. This norm satisfies a compatibility 
relation between the tensor norms on the respective tensor levels in that 

V(a,6) G (R d )® 1 x (R d )®l fc -d ; ||a<g)6|| (Rci)0fc = ||a|| (Rd)0i . 
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We define a norm on T^(R d ) by 


lb - ^llr(")(K d ) := ^ lb “ ^11 


, g,heTW{R d ), 


which turns into a Banach algebra. It is a well-known fact that the signature S n (x) not 
only takes its values in but it lies in a special nilpotent Lie group embedded in the 

tensor algebra. To be precise, the level-iz signature takes its values in the free step-n nilpotent 
Lie group with the generators {efc}^ =1 , which we denote by Indeed, defining the free 

step-n Lie algebra by 


S 


( n )tw d \ •= m 


^ [..., pR rf ,M d ]]], 


(n—1) brackets 

and the natural non-commutative exponential exp n : T(R d ) —> T ( R d ) by 


exp n (a) :=1 + Y^ 


k =1 


k\ ’ 


we have G^ n \M. d ) = exp n (g^ n b® d ))- Again using a formal power series, we can also define the 
truncated logarithm on T^>(M. d ): 

n fiifc 

log ri (a) := ^2 — 7—(1 — a ) 0a f° r a e T^ n \M. d ) such that 7 r 0 (a) = 1 . 


k =i 


The following characterization summarises the situation, (a proof can be found in [36J Theorem 
7.30]). 

Theorem 2.1 (Chow-Rashevskii). We have 

G (n \R d ) := {S n (x) 0 ,i : x G C 1 '™ ([0, l],M d )} . 

More abstractly, after fixing p > 1 we can consider a continuous group-valued path X : 

[ 0 , 1 ] -)• 

X t = (l,X t \... ,xJ pJ ) G G (LpJ) (K d ) Where 7 r fc (Xi) = X*. 


Importantly, the group structure provides a natural non-commutative notion of increment: 
X S) * := X7 1 (8)X 4 . This multiplication operation is well-defined by Chen’s Theorem [53, Theorem 
2.9]. 


2.2 Carnot-Caratheodory norm 

There exists a symmetric and sub-additive norm on G^pJ)(M d ) which is homogeneous with 
respect to the natural dilation operator on the tensor algebra, (see [TB] for details). This 
so-called Carnot-Caratheodory norm is given by 

Ibllc := inf | J MtI : 7 e C' 1_var ([ 0 , l],R d ) such that ^( 7 ) 0,1 = £ j , 

which is well-defined by the Chow-Rashevskii Theorem. We may then define the homogeneous 
p-variation metric between p-rough paths X, Y G C([0,1], (j^Lp^(M d )): 

dp-var;[0,l] (X, Y) := Slip f ^ [ ||X t t ® Y 
D=(ti)C[ 0,1] \ “ " 
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The p-variation norm is given by ||X|| pvar .j 0 ^ = dp_ var; [o,i] (X, 1)- If this latter quantity is finite, 
then u(s,t ) := ||X||p_ var .[ 0 xj is a control; a continuous bounded function, which vanishes on the 
diagonal {(t,t) : t G [0,1]}, and is super-additive in that for all s <t <u in [0,1]: 

(n(s, u) + uj(u, t ) < u(s , t ). 


Similarly we define the homogeneous l/p-Holder metric and norm by 


dl/p-H61;[0.1] (X,Y) := sup ■ 

0<s<t<l 

and define the rough path spaces: 
C p - var ([0,l],G {LpJ) (M d )) = 
C 1/p - H61 ([0,l],G {LpJ) (M d )) = 


\t-s\ 1/p 


c 


l|X|| 


1/p-Hoi; [0,1] 


(X) — rfi/p-HoljtO,!] ( X > 1) , 


{X e C ([0,1], G'W>(R“)) : ||X||^„ ;[01| (X) < oc} 
{X 6 C ([0,1], G(W)(R“)) : ||X|| lfrH8 , l|01] (X) < oc} . 


We stress that these spaces are not vector spaces; the addition of two rough paths, while being 
well-defined in the tensor algebra, may not sum up to a group element. 


2.3 Inhomogeneous metrics on rough path space 

The inhomogeneous p-variation and l/p-Holder metrics for p-rough paths are defined by ig¬ 
noring the group structure of Gfi p Jl(M d ), and instead using the inherited norm from the tensor 
algebra: 


Pp-va,r;[s,t] (X,Y) 


max 
fc=hL pJ 



(X,Y), 


Pl/p-H61;[0,l] (X, Y) 


max 

fc=i,...,Lpj 



where 



sup j 

( \W ( x ti,u +1 - Y u 

(ti)C[s,t] 

\ i 

sup ■ 

0<s<t<l 

\\^k (^s,t 

\t-s\ k/p 



Given a control function uj : A[ 0> i] —* [0, oo), we also define the metric 


k/p 


Pp- W ;[0,1] ( x > Y ) = , max P^, 0 x] (X, Y) where p™ Q x] (X, Y) = sup 
k= i. LpJ f J y ,L J o<s<t<i 


fife) 


||7r fc (X Sit - Y Sit ) 


us 


,t) k /p 


2.4 Geometric rough paths 

The space of weakly geometric p-rough paths will be denoted by WGr2 p (M d ). This is the set of 
continuous paths X with values in such that ||X|| pvar .[ 0 ^ < oo. A refinement of this 

notion is the space of geometric p-rough paths, denoted by GT2 p (M d ), which is the closure of 

{S b j(n:ieC 1 ”'([0,l],R' 1 )} 

with respect to the rough path metric d p _ var , (or equivalently p p _ var ). Certainly we have the 
inclusion GQ p (R d ) C WGf2p(IR d ) and it turns out that this inclusion is strict. As described in 
[53 , §3.2.2], this insignificant difference between geometric and weakly geometric rough paths 
can be compared to the difference between C 1 and Lipschitz functions. 
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We make a note that given X G WG'h2 p (R a! ) and some q > p, there is a canonical extension 
of X to a g-rough path ^^(X) G such that 7r 0 q p j (S'^X)) = X. This so-called 

rough path lift operation is unique in that sense that there exists a constant C = C(p,q ) such 
that 

l|X|U, ;[MI < ||Si,j(x)||^ ;|0il| < c ||X||_ i|0il] . 

2.5 Rough differential equations 

For now let x G C' 1 _var ([0, l],R d ) and let y G C([0, l],® 9 ) denote the solution of the (controlled) 
ordinary differential equation 


d 

dy t = V (y t ) dx t + V 0 (y t ) dt := ^ V k (y t ) dx k t + V 0 {y t ) dt, y 0 e R 9 , 

k =1 

which we summarise with the notation: y = TT(yy 0 )(yo,x). Here {V).}fc =0 is a collection of 
suitably regular vector fields 14 : R 9 —>■ R 9 . 

Definition 2.2. Let X G VFGD p (R d ) for some p > 1. We say that y G C([0,1],M 9 ) is a 
solution to the rough differential equation (RDE) with drift driven by X along the collection of 
vector fields V = {14}fc=i, Vo, and started from y 0 G M 9 ; if there exists a sequence {r„}“ j C 
C 1_var ([0, l],M d ) such that: 

lim sup HX 4 1 < 8 ) 5 w (x n ) s>t |L = 0, sup \\S [pi (x n )|| f , < 00 , 

n ^°O0<s<Kl ° n p var,[U, 1J 

and the sequence of ODE solutions y n := TT(y t v 0 ){yo,x n ) satisfies: 

hn - y IL ->■ 0 as n 00 . 

We denote this situation with the (formal) equation: dy t = V(y t )dK t + V 0 (y t )dt, which we refer 
to as a rough differential equation (with drift), retaining the notation y = ir(yy 0 )(yo,X), 

Given yo G M 9 , the mapping 


5 : VFGO p (R d ) -+C([ 0,1],M 9 ) : X ^ n {V y o) (y 0 ,X), 


is known as the Ito map. The initial raison d’etre of rough path theory was that the map 

5 ° S n (•) : X G C p - var ([0,1], M d ) ^ S n {x) G ^ 7r (V y o) (y 0 , (x)) G C ([0,1], R 9 ) 

is not continuous with respect to standard uniform topology on C([0,1], R d ) if d > 1. Coun¬ 
terexamples are easy to construct, (see [Ml §1.5]). I 11 fact, the mapping Ho S n (-) is continuous 
in the p- variation topology; that is, using the metric d p - V ar;[o,ip (equivalently p p - V ar ; [o,i])- One 
can think of the Ito map as a generalization of the classical Lamperti transform of SDE theory 
to multidimensional Brownian motion. 

2.6 SDEs as RDEs 

Part of the success of rough path theory has been its power of providing a pathwise construction 
of stochastic calculus; we fix a sample path of Brownian motion and can define the solution of 
the SDE without probability theory. As before, let W denote standard d -dimensional Brownian 
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motion. We first define enhanced Brownian motion as the rough path W G C([0,1], 
given by 

W M = 1 + w Stt + [ ® o dw u 

J s 

= exp 2 (Y W k (s,t) + Y A ki(s,t) 

\ k=l 1 <k<l<d 

d -i 

= 1 + Y w k( s ,t) + - Y W k (s,t)®W l (s,t) + Y A Us,t)eG^\R d ), (2.2) 

k= 1 l<k,l<d 1 <k<l<d 

where A k i : A[ 0 i i] —> [M d ,M d ] is the Levy area of W: 

A kl (s, t ) := 1 J* ( W k {s , «) dW t {u) - W t (s, u) dW k (u )). 

It can be shown that W G GQ p (R d ) almost surely. Consider the following Stratonovich SDE 
driven by W: 

d 

dx{t) = V(x(t))odW(t)+Vo(x(t)) dt := Y^ 14 dW k (t)+V 0 (x(t)) dt, x 0 = E M 9 , (2.3) 

k= 1 

where V 0 G Lip 1 (M 9 ) and V = {V k } d =1 G Lip 7 (M 9 ), 7 > 2. Then it can be proven, (see [3B) 
Theorem 17.3]), that x coincides with the unique solution y of the following RDE with drift: 

dy t = V (y t ) dW t + V 0 (y t ) dt, t / 0 = (o 6 q ■ 

In terms of the Ito map of the previous subsection: 

5 : W G GfI p (M d ) ^ vr y , yo (&, W) = y G C ([0,1], M 9 ). 

Remark 2.3. From the point of view of existence and uniqueness results, the appropriate way 
to measure the regularity of the V = {I4 }^ =1 turns out to be the notion of 7 -Lipschitz in 
the sense of Stein Chapter XI]. Since this notion of Lipschitz is standard throughout the 
rough path literature, we omit the definition for the sake of brevity (see [521 §1.2.2] and [Ml 
Definition 1.21] for precise details). Informally, the definition states that the vector field can be 
approximated locally by a function taking values in polynomial functions. In contrast, Taylor 
expansions view a classical Lipschitz function as a function taking values in a power series; that 
is, a polynomial itself (cf. (HQ] § 2 ]). 

The notion provides a norm on the space of such vector fields, which we denote by 

11 ^ 11 Lip 7 = ^axJlHllLip, ■ 

If 7 > 2 then it can be proven that there exists a unique solution to (12.31) almost surely (1 551 
Theorem 17.3]). 

2.7 RDE Lipschitz estimates 

It is well-known that the Lipschitz constant of the Ito map 5 for a RDE driven by a p-rough 
path X is of the order 0(exp{C ||X ||£_ var .[ 0 ^j). Even for the well-studied case of Gaussian rough 
paths, this random variable constant fails to be finite in any L 9 -norm, q > 1. Cass, Litterer and 
Lyons rectified this problem by refining the deterministic estimates for the Lipschitz constant 
in [12] • We start by recalling their definition of the so-called greedy p -variation partition. 
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Definition 2.4. Let X G VFCrf2p(R rf ). For a > 0 and [s, £] C [0,1], set 
To (a) = s 

r n+ i(a) = inf ju : ||X||J_ var;[r7iiU] > a and r„(a) < u < t} A t. 

Define the integer 

N a>p (X, [s , t}) ■= sup {n G N U {0} : r n (a) < t} . 

Certainly N a , p (X, [s, £]) < C ||X||p_ var .[ s t ] for some constant C > 0, but more importantly the 
tail estimates for N ajP (X, [s,t]) are significantly tighter than for ||X||^_ var .| s when we consider 
Gaussian rough paths X (cf. [ T5l [32]). The following result is a slight variation of [T 6 , Lemma 
4.2], 

Proposition 2.5. Let 7 > p > 1 and suppose X^X 2 G VUGf^lU). Define the control 
uj : A [ 0 i i] ->■ [0, 00 ) by 


u(s, t ) 


I* 


i —1 


x* 


ip 

lp-var;[s,t] 



Pp-var;[s,£] 5 ^ ) 

Pp-var;[0,l] (X 1 , X 2 ) 


p/k 


Finally, let V = {Vk} d k=l be a collection of Lip 7 (M 9 ) vector fields and let Vo G Lip 1 (M 9 ). Then 
the RDEs 

dyt = V{y\) dX] + V 0 (yi ) dt, y] = y 2 G M 9 , 

have unique solutions. Moreover, for every a > 0 there exists some constantC = C(a, r y,p, ||U|| Lip7 ) 
such that 


Pp-var;[ 0 ,l]( 2 /\ 2 / 2 ) < Cu( 0, 1) (l V (u(0, l) bJ+1 ) p p _ var; [ 0 ,l] (X 1 , X 2 ) 

■ exp (C {1 + A^X 1 , [0,1]) + N a , p (X 2 , [0,1])}) . 

Proof. The proof is obtained from following the arguments of [T0, Lemma 4.2] with some minor 
modifications. Indeed, the latter result guarantees that 

Pp-w;[o,i](z/\j/ 2 ) < Cpp. u .[ 0 i i] (X^X 2 ) exp (CM tti[ 0 ,i](w)) , 

where 

M a> [ S!t ](u) := sup 

D=(ti)c[s,t\ i 

It can be shown that N a ^ p (w, [0,1]) < M a j 0 ,i](w) < 21V a] p (a;, [0,1]) + 1. Moreover, by [9] Lemma 
6] there exists a constant C > 0 such that 

N a {u, [0,1]) < C (1 + N aj> (X 1 , [0,1]) + N aj) (X 2 , [0,1])) . 

Therefore, 

PpM 0 ,i](y\y 2 ) < ( xl , x2 ) ex P (<? I 1 + N a , p (X\ [0,1]) + N a , p (X 2 , [0,1])}) . (2.4) 

Under the assumption that ca(0,1) < 1, we can refer to the proof of [9) Theorem 4] to find that 

ll X 1lp-w;[0,l] < 1 i i = 1 i 2 and Pp-W, [ 0 , 1 ] (X\X 2 ) < Pp-var;[ 0 ,l] (X 1 , X 2 ) . 

For the general case, a normalization argument gives 

Pp -u;[ 0 , 1 ] (X 1 , X 2 ) < (lV W (0,l)W)p m[o ,i] (X 1 , X 2 ) . (2.5) 

As a consequence of the super-additivity of controls (cf. [36, §8.1]), 

Pp-var;[ 0 ,l] (jj i'll ) — ^( 0 , l)/ 7 >-w;[ 0 ,l] (jj iU )j 

and combining this inequality with (12.4p and (12.5)) concludes the proof. □ 
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3 Log-ODE method 


The log-ODE method is a powerful technique for numerically simulating the solutions of SDEs 
or, more generally, RDEs [35] 53]. The method even holds for RDEs in infinite-dimensional 
Banach space, (see 13 HD |59j). For this paper we restrict our study of the log-ODE method 
to level m = 2, though the scheme can be extended to an arbitrary level m. We give a 
simplified, tailored overview based on [4T 1 , : 52l [58] and [76] §4], The reader can also find a concise 
introduction in §7 of [53]. The method can also be found in the papers gfl H3 m on mi E3, 
which were inspired by the pioneering work of Chen [2D] and Magnus [61] . 

We begin by introducing a special Lie algebra homomorphism. A vector field V : M . 9 —>- M.' 3 ' 
can be interpreted as a differential operator: 


r(/) = X> 

2=1 


idf_ 

dxi 


f eC 1 (M 9 , M). 


This allows us to define the Lie bracket operation on two continuously differentiable vector 
fields: 


[U, Vi] := V 

i,j=1 


vi a S-v! avi 


k d Xj 


1 dx i 


Define the Lie algebra homomorphism <f> : (M © M. d ) © [M d , M rf ] —>■ Lip 7 2 (M 9 ) by 


<f>(e 0 ) = V 0 , <L(e fc ) = 14, k = l,...,d. 


Thus <f> maps Lie algebra elements in g ( T to vector fields on M 9 , while preserving Lie brackets. 
For example, <f> (e 0 + e* + [e k , ef) = V 0 + Vi + [14, V t \. 

Recall (II.ip : the Stratonovich SDE to be approximated is given by 

d 

dx(t) = V(x(t)) o dW (■ t ) + Vo(x(t)) dt = 14 (x(t)) o dW k (t) + Vo(x(t)) dt , t G [0,1], (3.1) 

k =1 

where x 0 E M 9 , V 0 E Lip^M 9 ) and V = {14}fc=i E Lip 7 (M 9 ) for 7 > 2. We first consider 
the problem of approximating the solution of (13. 1 K at time t = h. To do this we consider the 
classical ODE: 


dy(t ) = <f> (he 0 + log 2 W 0ih ) ( y(t )) dt, t E [0,1], 

2/(0) = x(0), 

where W denotes enhanced Brownian motion. Adopting the classical flow notation, we write 
y(t) = e tF (x(0)), t E [0,1], where F = <f> (lie 0 + log 2 W 0 ,h). Recalling (12.2) 1 . log 2 W 0 ,h has the 
form: 

d 

log 2 (W„n = e t + J2 

k =1 l<k<l<d 

The stochastic Taylor expansion gives us the following error estimate. 

Proposition 3.1. There exists a constant C = C( 7 ) such that 

E(||i(ft)-s/(l)p <Ch\ 
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We can now construct a numerical SDE scheme {xbl]74 0 by repeating the ODE approxima¬ 
tions given in the previous lemma successively over each interval [jh, (j +1 )h\, where h = N~ 1 . 
In particular, set x ( 0) = x(0), then 

x U+i) = exp (iT.) ( x 0‘)) , j = 0 , 1 ,..., N - 1, (3.2) 

where F J = [he 0 + log 2 W jht (j +1)h ) . 

This is precisely the so-called log-ODE method. Moving from local to global error costs half 
an order of magnitude in the L 2 -norm using Doob’s maximal inequality. 

Theorem 3.2 (Log-ODE method). There exists a constant C = C( 7 ) such that 

E ^ max^ \\x ^ — x(jh) < Ch 2 . 

In other words the method provides a strong approximation scheme of order 0(h). A 
complete proof of Theorem 13.21 for the general case of arbitrary m can be found in [ 41] . 

Remark 3.3. As noted in Remark 1.1 of [57], if one approximates each ODE (13.2j) by its 
order 1 Taylor expansion, then we fall back on the traditional Euler scheme. Similarly, taking 
the better approximation offered by the order 2 Taylor expansion gives the Milstein scheme. 
However, the log-ODE method has one important advantage over these more popular schemes. 
As outlined in [53] §7], the previous methods are based on Taylor expansions and thus can 
produce approximations whose law is not absolutely continuous with respect to the measure 
of the solution on Wiener space. For example, we could consider a SDE whose solution is 
constrained by its Stratonovich formulation to lie on the unit sphere at all times. Both the 
Euler and Milstein schemes are numerically unstable in that they will output approximations 
which do not live on the sphere (see [US § 17.5]). By contrast, the log-ODE method only returns 
solutions which could have originated from an actual realization of the SDE, (assuming that 
the ODE solver used is sufficiently accurate over small time steps; for example, the Runge- 
Kutta method or an adaptive step-size method). This is because the technique restricts the 
approximations to only flow along the vector fields (and their nested Lie brackets) of the original 
SDE. In turn, this ensures that feasibility constraints imposed on the original SDE law will also 
be satisfied by the log-ODE output. A more complicated example could be a system with 
Hamiltonian vector fields m or the stochastic volatility example of [63, §3]. 

Even in one dimension this instability becomes apparent. We repeat the simple but well- 
known counterexample offered by Hutzenthaler, Jentzen and Kloede in [42]. Consider the 
following stochastic differential equation with cubic drift and additive noise, where d = q = 1 : 

dx t = dWt — x 3 dt, Xo = 0, t G [0,1]. 

The corresponding Euler scheme is given by xb'+b = — (x ^) 3 h + with x® = 0. It 

can be shown that this scheme does not converge strongly or weakly (see [031 §3.5.1] and [ 061 
Theorem 3.4] for details): 

EdxiH - e (i^ (jv) r) 

for every q G [1, 00 ). 

We conclude our introduction to the log-ODE by considering the case of nilpotent vector 
fields (for simplicity, let us assume that Vo = 0). In particular, suppose our vector field system 
is 3-nilpotent: [V), [14, V)]] = 0 for every triple (j, k,l) G {1,..., d} 3 . In this case the level 2 


lim E ( Ixi — = 00 = lim 

N—>00 V / N—>00 
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log-ODE method is exact: = x(jh ) for j — 1,..., N. To see this, note that the condition 

implies that 


<f> ([e h , [e i2 ,..., [ej n _ 1 , e* n ]]]) = 0 for all (H,. ..,*„) G {1,..., d} n where n > 3. 


Thus, 

x{h) = exp (<h (log W 0 , fc )) (x(0)) = exp (<f> (tt 2 (log W 0>fc ))) (x(0)) = x (1) , 

and by induction: x(jh) = x (j> for all j. As noted in [4T], we cannot guarantee that equality 
holds at intermediate times t G (jh, (j + 1 )h). This phenomenon holds in greater generality for 
higher levels; if a vector held system is m-nilpotent, then the level m — 2 log-ODE scheme is 
exact. Similarly, if a vector held system is 2-nilpotent ([14, Vf\ = 0 for all k, l), then the level-1 
and level-2 log-ODE coincide and are exact. 


4 Piecewise abelian rough paths 

We can recast the log-ODE technique in the language of rough path theory using our notion 
of piecewise abelian rough paths. As before the unit interval [0,1] is partitioned into intervals 
[jh, (j + 1 )h] of equal length h = N -1 . First we define a piecewise abelian rough path. 

Definition 4.1. We call a p-rough path X G C ([0,1], G^J)(K d )) a piecewise abelian p-rough 
path if the identity 

X si)tl ® X S2]t2 = X S2jt2 ® x si)tl (4-1) 

holds for all (si,ti), (s 2 ,t 2 ) G A[ jh ^ j+1)h] for each j. 

In the case of [pj = 1, the definition is trivial since the group G^(M d ) = is abelian. 
However, suppose we have a bounded variation path A" : [0,1] —> R d such that its level-2 
enhancement X := S 2 (X) is a piecewise abelian 2-rough path. Then (j4.1j) implies that 

0 = [x ih ,ui Au,(j+l)fa] [A^j7i jU , A jh, U T X Ut (j + i)h] Xjh,(j+l)ti\ j 

and so for every t G [jh, ( j + 1 )h\, Xjh,t and Xjh,(j+i)h are parallel vectors. Since both vectors 
start from Xjh, we conclude that A" is piecewise linear over the increment [jh, ( j + 1 )h\. Due to 
this observation, at least in some heuristic sense, we can think of piecewise abelian rough paths 
as the natural non-commutative (that is group-valued), equivalent of piecewise linear paths in 
R d . 

We now return to our original task of rewriting the log-ODE method in terms of this new 
class of rough paths. The original SDE (11.11) can be rewritten as a RDE with drift, driven by 
enhanced Brownian motion W: 

dx t = V (x t ) dW t + Vo (x t ) dt, x 0 G K 9 . (4.2) 

For a proof of this equivalence we refer to Theorem 17.3 of [36]. From this perspective, the 
underlying idea of the log-ODE method is to approximate solutions of the SDE/RDE (11.1 [44.2(1 
by replacing W with its much simpler A r -step piecewise abelian approximation W4 In order to 
introduce this finite-dimensional approximation we need to set notation for the Brownian and 
Levy area increments as follows: 

: = W k (jh, (j + 1 )h) := W k ((j + 1 )h) - W k (jh), 
i rU+W 

: = - (W k (jh,t) dWft) - W t (jh,t) dW k (t )) = A kl (jh, (j + 1 )h). 

^ J jh 
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Definition 4.2. Define W h G C' p_var ([0,1], G^ 2 ^(M d )) to be the piecewise abelian rough path given 
iteratively by Wg = 1 , then 

w o,t = W oj h ® exp 2 ((t - , t G O'/i, (j + l)/i], 

where 

d 

:= ^ ^ 4?[efc, e,] et'ffi [M rf , M d ] = g (2) (K d ). 

fc=l 1 <k<l<d 

Chen’s Theorem and the identity W, j( = (W j h s ) 1 < 8 >W j jht give W(? f = exp 2 ((£ — s)£^')) for 
[s, £] C [jh, (j + 1 )h\. Then the Baker-Campbell-Hausdorff formula confirms that W h actually 
is a piecewise abelian rough path. 

Remark 4.3. One can also think of W /l as the iV-step random walk in the 2-nilpotent Lie 
group G( 2 )(R d ) with i.i.d increments exp 2 (£^) G (see [12, 5D]) - Here the corresponding 

increments live in the 2-nilpotent Lie algebra g ( 2 )(M d ). Previous research has studied this 
random walk interpretation of what we have called piecewise abelian rough paths. In [13] the 
authors used the Central Limit Theorem in nilpotent Lie groups to prove a Donsker-type weak 
limit theorem for similar random walks converging to enhanced Brownian motion in a particular 
rough path Holder topology. 

The rough path W h is certainly a 2-geometric rough path as it can be approximated in the 
2-variation topology by the signatures of a sequence of bounded variation paths. Moreover, 
since the rough path is defined by linear interpolation in the Lie algebra g ( 2 ^(M d ), over each 
interval [jh, ( j + l)/i] W h is also the shortest rough path candidate, (measured in p-variation), 
with its corresponding group increment W jh(j+i)h matching G g( 2 )(M rf ). Therefore is 
also piecewise geodesic. 

Remark 4.4. Note that W h is (almost surely) not the lift of an actual path in R d ; that is, 
W h 7 ^ S 2 (Y) for some stochastic process Y : [0,1] —> M d . Indeed if otherwise, Y would enclose 
non-zero area over each increment yet be piecewise linear, implying a contradiction. Given an 
increment ^°- > G G® (R d ), the interesting problem of finding a helix path 7 G C ([0,1], M d ) with 
minimal length such that £ 2 ( 7 ) = is known as reconstruction ([a §5] and [56]). 

Having defined W h , we now reformulate the log-ODE method in terms of a rough differential 
equation: 

Proposition 4.5. Let y G C([0,1], M <? ) be the solution of the RDE with drift 

dy t = V (y t ) dWt + V 0 (y t ) dt, y 0 = x 0 G R q , (4.3) 

and let x h G C([0,1],R 9 ) denote the entire path of the approximation produced by the log-ODE 
method given by 113.fy) . Then y and x h coincide at all times: y(t ) = x h (t) for all t G [0,1]. 

Proof. The claim follows immediately from j31| Theorem 2], □ 

Remark 4.6. To prove the weaker statement: 

y (jh) = x h (jh)= x^ for all j = 1 ,... ,N, 

a more intuitive proof would be to compare the stochastic Taylor expansions of y(jh) and x^f 
One hnds that the expansions agree at the times t = jh up to every order (cf. [35]) and hence 
the approximations must coincide. 
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We can also consider the rough path lift of W h to level k > 2: 

S K (W h ) e 


This lift is unique in the sense that the p -variation of S K (W h ) is equal, (up to multiplicative 
constants), to that of W h . Indeed, by the Lipschitz-continuity of the lift operator ( [361 Theorem 
9.5]), given p G [2,3) there exists a constant C = C(k,p) such that 

\\W h \\ rnil < ||5«(W h )|| rnil < C\\W h \\ 

II llp-var;[0,l] — II ' Ilp-var;[0,l] — II Ilp-var;[0,l] 

We can ask what is S K (W h ) precisely? To answer this we treat the increments 

= W U) + A® G R d © [R d , R d } = g (2) (K d ) 

as Lie increments in the larger free K-step nilpotent Lie algebra gl K )(M d ). Then we can 
define a piecewise abelian K-rough path Z h G C([0,1], (M d )) by Zq = 1 and 

A,t = Kjh ® exp K ((t - jh)h _1 ? j) ) , t G \jh, (j + 1 )h]. 

We claim that S K ( W h ) = Z h . To prove this, first note that the first two levels of and Z h 
agree: 7r 0)2 (Z^ t — Wj s ) = 0 for all s,t G [0,1]. Moreover, by its construction Z h is certainly 
a multiplicative functional. By the rough path extension theorem of [53], Theorem 3.7] it 
remains to show that the p -variation of the higher tensor level increments of Z h are controlled 
by that of W ?i . By using Chen’s Theorem, it suffices to consider the p-variation over the single 
interval [0, h\. Exploiting the piecewise geodesic nature of W h and Z h , there exists constants 
Ci = Ci(n) > 0, i — 1, 2 , such that 


IKIU ;[ o A1 = k ,0) llc = Cl (ll» ,(0, ll v vW^ii) 


Co 




c 



p-var;[0,/i] ’ 


where the Carnot-Caratheodory norms are taken over C^ 2 ^(M d ) and GC)(M d ) respectively. We 
conclude that S K (W h ) = Z h , as claimed. 

Remark 4.7. Our interest in the enhancement S K (W h ) comes from the following observation 
that will become critical in the proof of our main result: if we replace the driving rough path 
W h by S K (W h ) in the RDE ( 14 . 3p . the solution of Proposition 14.51 remains the same, (although 
we must assume stronger conditions on our vector fields in order for the RDE to have a unique 
solution: namely that V = {Vk}t=i e Lip 7 , where 7 > k > 2 instead of simply 7 > 2). To see 
this directly, we note that by their construction the log-signatures of W h and S K (W h ) coincide 
over each increment [jh, (j + 1 )h\: 


log (S.(W'‘) jl , 0+1)k ) = log (W% iU+1)h ). 


This is a slight abuse of notation since the left-hand side lives in a larger Lie algebra, albeit 
with zero terms of multiplicity greater than 2. Therefore RDEs driven by S K (W h ) are precisely 
the same as that driven by the original W h . For a formal proof see the perturbation result 
of [36, Theorem 12.14] and [3T . Theorem 2], (the latter covers the drift case). An important 
consequence of this is that the log-ODE technique produces the same approximation points 
whether we use W h or S K (W h ) to drive the RDE in Proposition 14.51 In fact, this phenomenon 
holds for arbitrary rough paths and their lifts. 
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5 Gaussian approximations of Levy area 


In practice, it is difficult to drive RDEs with the piecewise abelian rough path W h because 
we must be able to generate Levy area increments A^\ (which is numerically challenging if 
d > 2). Following the recent papers [23], EUJ of Davie, we propose another piecewise abelian 
2-rough path X h which substitutes each with a suitable Gaussian random variable B^\ 
thereby being much easier to generate. In particular, the and £> D share the same mean 
and covariance structure, (that is, we are moment matching up to order 2). But before we 
go into detail and define X ft , let us closely examine the area increments. The following lemma 
gives a simple decomposition of into parts dependent and independent of the corresponding 
Brownian increment W ( A := W(jh, (j + 1 )h). 


Lemma 5.1. For all 1 < k < l < d: 


4? = cfV® - cf’wf + k%\ 


where the k — 1,..., d, : 1 < k < l < d, are mutually uncorrelated (but not indepen¬ 
dent), independent ofW^ and have mean zero. Moreover, Var((j; : J> ) = A and Var(K^) = 


Proof. We suppose j = 0 for simplicity and begin by decomposing the Ah 0 -) increment into parts 
dependent and independent of the Brownian increment W(h). To this end, following [231 §7], 
we can write Wk(t) = h ld2 Bk(t/h) + th~ x ^V\. for t e [0, h], where Bi, ..., Bd are independent 
standard Brownian bridges on [0,1] and 14 = h~ x / 2 Wk(h) are independent 1V(0,1) (and are 
independent of the Bf). Also write B 0 (t) = t and set 

Kq / [ ■ ■ ■ [ dB h(ti) ■ ■ ■ dB n (ti) 

Jo Jo Jo 

for an index a = (j i, ■ ■ ■ ,ji) G {0,1..., d} 1 . For such an index it can be shown that 


la ■ = 



Pdw jI (t 1 )...dw jl (t,)=h ^+^ 2 Y. K f> n y s» 

0 k:i k <j k 


where the sum is over all j3 = (4, ■ ■ ■, ii) such that for each k e {1,...,/} we have either 4 = jk 
or ik = 0 < jk■ Here we have used 1(a) and n(a) to denote the length and number of zero 
entries of a respectively. Noting that A4z = —Kik for 0 < k < l, it follows that 

4? = l (k \2 - hi) = h (K w V 2 - K 20 V ! + K 12 ), 

where 

K l2 = j Bi(t) dB 2 (t) and K ]{i = f Bj(t) dt for j = 1, 2. 

Jo Jo 

Thus Cf := hh 2 Kf a ] for j = 1,2, and := hK 12 gives the claimed decomposition. The 
variances follow from Ito’s isometry. For details we refer to Lemma 7 of [23]. □ 

The fact that and K ( P are not independent makes them (and consequently A^ j >) very 
difficult to simulate numerically. A natural solution would be to approximate these two vari¬ 
ables with normal random variables z^\ with the correct mean and moments, to produce 
a Gaussian approximation B d> for A <J K Since uncorrelated Gaussian random variables are 
necessarily independent, simulation is much easier. This is precisely what Davie proposes: 

4? := A'wJ - 4W + 47 (5-i) 
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where and are independent normal random variables with z k 3 ' 1 ~ iV(0, and ~ 
V(0, g). Here r\j N(0,h) but this increment may not necessarily be equal to the original 
Brownian increment W^K As before, the z k \ are all independent of W^\ 

In the same fashion as the construction of W h in Definition 14. 21 we define X h £ C ([0,1], G^ 2 )(M d )) 
to be the piecewise abelian, geometric 2-rough path given by X(j = 1 and 

x o,t = x o jh ® exp 2 ((i - jh )^ 1 ri {3) ) , t G \jh, (j + 1 )h], 

where the increments 77^ G are defined as: 

l<fc</<C/ 

As with W h , we may consider the K-lift S K (X h ) of X h and, as before, RDEs driven by X h and 
S K (X h ) coincide. Moreover, again using Theorem 2 of ra we can prove that the approximation 
scheme {xj }jLo given by (11.41) coincides with the solution of the following RDE with drift: 

dzt = V (z t ) dX? + V 0 (z t ) dt, z 0 = x 0 G ML 

In particular, Zjh — x 1 ^ for j — 1, ..., N. 


6 Wasserstein coupling of Levy area 


In this section we construct a probabilistic coupling of the constituent random variables of 
W h and X h so as to achieve a coupling of the two piecewise abelian rough paths. We directly 
follow the dyadic coupling argument of [23] in which Davie presented a numerical approximation 
scheme for SDEs based on a variant of the famous 1975 theorem of Komlos, Major and Tusnady 
[48] . The latter result is a form of the simultaneous Central Limit Theorem using couplings. 
As Davie writes, it states that if P is a suitably non-degenerate probability measure on M with 
mean zero, variance 1 and zero third moment, then there exists a universal constant C > 0 
such that the following holds: for each n G N, one can construct a probability space on which 
there exists a sequence of i.i.d random variables AR,... ,X n with law P and a corresponding 
sequence of i.i.d N(0, 1) variables Yi,..., Y n such that 


max 

k=l,...,n 


k 

Ew - 

2—1 


< c. 

L 2 


This original KMT Theorem was then extended to vector random variables by Einmahl in [28] 
and then Zaitsev established the result for the case of non-identical distributions which are uni¬ 
formly non-degenerate in a series of papers [88]. For approximating non-Gaussian distributions 
P, both coupling results are proven to be optimal among all couplings (see [86 : . 187]). 


Remark 6.1. Following Zaitsev’s work, Davie proves his own variation which allows the under¬ 
lying distributions to be random themselves. One cannot just apply the original KMT theorem 
or Zaitsev’s version to the random walk composed of the Levy area increments because one 
would be unable to say anything about how close the increments of the Brownian motion W^ 
and the Gaussian approximation W 1 ' 3 ) are _ Moreover, in the case of d — 2, directly applying the 
classical KMT theorem to the one-dimensional random walk composed of the increments 
would give a Wasserstein rate of convergence of 0(—y/hlogli) by scaling (see [881 Theorem 1]). 
Therefore a more sophisticated argument is needed. 
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For our coupling of W h and H h we change a small part of Davie’s argument. In the original 
paper [23], the approximation points of the Milstein scheme were coupled as a vector with the 
corresponding points of the SDE solution. That is, speaking from the perspective of rough path 
theory, the coupling took place at the output side of the Ito map 5. In contrast, our approach 
is to use Davie’s coupling argument at the input side of S. Moreover, in his case Davie coupled 
the Brownian increments W D and (that is, they were not necessarily equal). In our 

application, we assume that W h and X h share the same increments: W < ' ] ' 1 = W (j> for all j. As 
we will see, this simplification makes calculations using the Baker-Campbell-Hausdorff formula 
(Theorem 1 11.1 1) much easier to handle. What remains is to couple the Levy area increments 
A (j ^ with the Gaussian approximations defined above. 

Before stating the coupling result let us introduce some notation. Without loss of generality 
suppose that N = h~ 1 = 2 m for some integer m. This can always be arranged by extending 
the SDE to the interval [0,2 m h\, where m is the smallest integer such that 2 m > N. Define a 
dyadic set to be a subset £C{ 0 ,l,..., 2 ra - 1 } of the form 


E = {k2 n , k2 n + 1 ,...,(£; + l)2 n - 1} , 

for some n 6 (0,1,..., m} and k G {0,1,..., 2 m ~ n — 1}. Define the [M d , M d ]-valued partial sums 
lE : = ^A (r) , A E ■= J2 Bir) for all E C {0,1,..., 2 m — 1} . 

rS-E r£E 


Proposition 6.2. There exists a constant C > 0 and a probability space on which one can 
define , and 7 ^, A# for all subsets E C {0,1,..., 2 m — 1} (both using the W^), 

such that 

117 E - AE|| (Rd) ®2 L5/2 < Chloglfir 1 ). 

In the special case of E being a dyadic subset we have ||| 7 e — As ||| l5 /2 < Ch. 

Remark 6.3. Regardless of our coupling, by scaling we automatically have 


|||7e - Ae||| L 5/2 < |||7e||| L 5/2 + 11|A^|| l^s /2 < Ch when E = {i} , 

and so Proposition 16.21 is trivial when considered locally, (that is on single intervals). The point 
is that the coupling performs well globally and it is this property which we will exploit when 
considering p-variation of the coupled lifted piecewise abelian rough paths. Heuristically, one 
can think of this as a probabilistic analogue of the fact that there exists paths which are far 
apart in 1 -variation but very close in p -variation for large p> 1 . 

We repeat again: the proof is essentially a special case of Davie’s original Theorem 1 of [23 ]. 

Proof. Let Q denote the a -algebra generated by the increments W^\ ..., For each r 

define a random vector e by Xjf' 1 = for k = 1,. .., d and X^ = 

~2 /C IJ-rfc 

(g)-i/2tfW for 1 < k < l < d. Then, (conditional on Q\ X^ has mean zero and covariance 
matrix Id^ d+l y We can then write 


/rM (r) = G r x (r) , 

where G r is a |(d — 1 ) x |(d + 1 ) matrix defined in terms of the Specifically 


G r = 


v/12 V 


( Mr 


Ud-i) 
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setting M r to be the |(d — 1) x d matrix defined by the rows 


= h ' 1 ' 2 ( W IX - - 

This makes M r have the form: 


/ W 2 (r) -W± 


(r) 


M r = h ~ 1/2 


0 


w. 


\ 0 


(r) 


0 

0 


-IT) 

0 


(r) 


0 

0 


... IT. 


(r) 


0 \ 
0 

) 


In the same way we have h l B^ = G r X^ r \ where is N(0,ld^ d+1) ). 

It follows that G r Gl = A(/ + M r Mf) is a positive-definite symmetric matrix. Since 
/r 1/2 IT (r) ~ 1V(0,1), certainly EdlGVIl 9 ) < C(q ) for all q > 1. Moreover, the eigenvalues 
of GrGl are bounded below by hence ||(G>Gj.)^ 1 || < 12. Note that conditional on Q, 
and have the same covariance matrix h 2 G r G t r . 

For each dyadic set E of size 2 n define the matrix H E = 2~ n Yh r &E G r G\. Since, conditional 
on Q , the random variables A^°\ ..., are independent, H E is the (conditional) covariance 

7 e- Similarly H E is also the (conditional) covariance matrix of 


matrix of Y E := 2 n ! 2 h 1 


Z E := 2~ n ! 2 hr 1 \ E . Note that Elff is well defined since are positive-definite symmetric 

matrices. Moreover, H E — T(/ + 2~ n Y2reE M r M*), so the eigenvalues of H E are also bounded 
below by and hence ||// i ) I || < 12. It follows that E(||// s || 9 ) < C(q ), q> 1. 

Having established suitable ZP-bounds on the matrices H E and Hf , the proof then follows 
precisely the same course as that of [23] - The idea of Davie’s proof is to construct couplings of 
Ye and Z E recursively, starting with the base case E 0 = {0,1,..., 2 m — 1} and proceeding by 
successive bisection. Since the proof is precisely the same we omit the details for the sake of 
brevity. 


Proposition 6.4 (Davie [23]). Suppose that for each q> 1, there exists a constant C = C(q ) 
such that E(||H E || 9 ),E(||iZ i ) 1 || 9 ) < C for every dyadic set E. Then there exists a constant 
C > 0 and a probability space on which we can define {W^} 2 ^ 1 , and "y e ,\ e for all subsets 
E C {0,1, ..., 2 m — 1}, such that 


I Y e - Z 


E ||ll,5/2 ^ 


< C2 ~ n/2 


for every dyadic set E of size 2 n . 

Returning to the proof of Proposition 16.21 Davie’s result yields 

Hide ~~ ^s||lx,5/2 = 2 n / 2 h\\\Y e — Z E ||| L 5/2 < Ch , 

whenever E is a dyadic set of size 2 n . A general, not necessarily dyadic, subset E can be 
expressed as the disjoint union of at most log 2 N = log 2 (/r _1 ) dyadic sets Ei ,..., of different 
sizes. It follows that 


k 

|||7e — ^e\\\ L 5/2 < ^ |||7 Ej ~ A^.||| L 5/2 < Ch\og(h x ). 
3 =1 


The proof is complete. 


□ 
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Remark 6.5. One could argue that we could have saved ourselves trouble by simply applying 
Davie’s original coupling result to the SDE defining the Levy area. This approach is perfectly 
sound since this SDE satisfies the non-degeneracy condition (11.9j) and we would end up with 
an approximation of the Levy area increments such that 


A e ^ satisfies 

r£E 


7 E — A E 


< Chlog(h 1 ), 


ME C {0,1,..., TV - 1}. 


However, we would not be able to guarantee that the increments and W^ J> making up Xe 
and Xe would necessarily be equal. As mentioned, we will see in the proceeding sections that 
having the increments IP A) = Ip (A equal in the dehnition of W h and X h greatly reduces the 
complexity of some of the computations involving the Baker-Campbell-Hausdorff formula. 

As defined in the introduction, let x h and 0 h denote the [W 1 , W 1 ]- valued N(= /r^ 1 )-step 
random walks made up of the increments and respectively. Proposition 16.21 establishes 
a non-local coupling in the sense that the \ h and O h are not adapted to the same filtration. 
Importantly, this means that the error given by the discrete process 8 j := Xj~ is not 
a martingale, and so we cannot employ Doob’s maximal L 2 -martingale inequality to arrive 
quickly and painlessly at a useful maximal inequality for the coupling error. 


7 Coupling piecewise abelian rough paths 


The coupling provided by Proposition [fO] automatically induces a coupling between W h and H h . 
One may ask how well this coupling performs in a given rough path metric topology. Since we 
are interested in numerically approximating SDEs using these rough paths, we need to employ 
the useful RDE Lipschitz estimates of [9] [15| [32]. These estimates use the the inhomogeneous 
p -variation metric and so this is the metric we focus on (see Proposition 12.5p . 

It can be shown that there is an upper bound of 0(—hlogh) on the performance of any 
coupling of W h and X h under the restriction that their underlying Gaussian increments agree. 
To be precise, the performance is measured in the Wasserstein metric using the inhomogeneous 
p -variation metric as the cost function. This is the content of Corollary 17. 2l below. The Erst step 
of proving this upper bound is the following proposition which is built directly upon a similar 
result established in the final example of [2j3]. For ease of notation, denote the Cartesian product 
[M d ,M d ] xAr by 

Proposition 7.1. Suppose d> 2 and let p and v denote the laws of {Xj}f=i and in 

[R d , R d ] N . Set 

\= {T : [M. d , M. d ] N -X [W :l , R rf ] ,Y : = v and T measurable} . 

Then for all q > 1, there exists a constant c = c(q ) > 0 such that 




[ inf 



( max 

k=l,...,N 


k 

(Xj - *{x)j) 

3 = 1 


1/9 


p(dx) 


> ch log (h 


-i\ 


Proof. It suffices to prove the proposition for d = 2. Consider the SDE defining Levy area: 
dx i = dW\ , dx 2 = dW -2 , dx 3 = -(xi dW 2 — X2 dWi), 

on the time interval [0,1], with initial condition ay(0) = 0. It follows that X 3 (f) = A 12 (0,t) 
and by direct calculation it can be shown that the corresponding Milstein scheme is 
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exact in that = x(jh). In particular, y\p = A 12 (0,jh) for all j. At each time t = jh , 
the accumulated area of \ h corresponds to A 12 (0, jh) = yp\ while the accumulated area of Q h 
equals x^\ the point produced by the approximation scheme proposed in [23] using the D <J) 
increments (where we do not assume that is necessarily equal to in our coupling). 

Davie proves that there is a constant c > 0 such that, for every integer N and any coupling 
between the random variables W^\ z^\ A-b (used to define B^) and the Brownian motion W 
and its Levy area increments A (j \ we have 

P ^ max^ \\x^ — x(jh)|| R3 > ch log(/i -1 )^ > 2~ l . (7.1) 

Markov’s inequality then finishes the proof. □ 


Corollary 7.2. Suppose d > 2, p e [2,3), and q> 1. Let /i, v denote the respective measures 
of the piecewise abelian rough paths W h and H h on GD p (M d ), and set 

AC(/i, v) := {T : GLl p (W l ) — > GLl p (R d ) : 'lq(yu) = u, T measurable and 7T\ (T(X)) = 7Ti(X)j. 
Then there exists a constant c = c(q ) > 0 such that 


VKM ■= 


inf 


’Gn p (R d ) 


Pp-var;[0,l] 


\ 1 /l 

i,T(X))V(dX) ) > ch\og(h 


- 1 ' 




Proof. We apply Proposition 17.11 and restrict ourselves to couplings with the underlying Gaus¬ 
sian increments of and B^ :,) are equal; that is, Ifdb = W (j ^ for all j. Since (W h ) = ni(K h ), 
the difference 7 t 2 ( W h — X h ) lies in the centre [R d , R d ] of the Lie group G^iW 1 ) and is thus 
equal to the difference between the piecewise linear interpolations of the accumulated areas 
of W h and XL These accumulated areas correspond precisely to the piecewise linear interpo¬ 
lation of the random walks x h and B h . The result then follows immediately from the upper 
bound provided by Proposition 17.11 Indeed, this latter proposition establishes a lower bound 
for all couplings of A^ and B (j \ while we are only interested couplings where the underlying 
Brownian increments equal. Certainly the lower bound still holds for this particular subset of 
couplings. □ 


We use the notation W*(yU, v) and W*(/i, u) to differentiate from the Wasserstein metric 
on C([0,1],R 9 ) as defined for Theorem 11.11 (where q = 2). Note that W*(-) is a 
legitimate Wasserstein metric on [R d , R d ] xjv , while W*(-) is not a true metric on GLl p (R d ). For 
the quantity W*(/i, u) to be well-defined we require that the set Af*(/i, u) is non-empty, which 
will only be the case if we have the following equality of the pushforward measures: 

(Ai)* (At) = (m)* iy) • 

That is, the laws of the two rough path measures at level 1 must be equal for W* (/i, u) to be 
well-defined. To our knowledge, [IHl §3] is the first paper to explicitly use the Wasserstein metric 
on the space of geometric p-rough paths by using the p-variation metric as a cost function. 

Remark 7.3. Let us return to considering the SDE defining Levy area, (remaining in the case 
of d = 2 for simplicity): 

dx i = dWi, dx2 = dW2, dx 3 = - (x± dlT 2 — X2 dW\ ). (7.2) 

In common with the Milstein scheme, a simple calculation shows that the original log-ODE 
method is exact a^b = A 12 (0,jh) = JflZn A^Z Similarly, our new approximation scheme 
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{x^}^ =l can be shown to satisfy x'f = Y2l=o Thus Proposition 17.11 implies that among all 
the couplings of A^A and B^A such that the underlying Brownian increments are equal (that is, 
= tyb)), the optimal rate of convergence of our approximation scheme in the Wasserstcin 
metric (as defined in Theorem II. ip . must be at least worse than 0(—hlogh). Theorem 11.11 
gives a rate of convergence of 0 (/r 1-2 / 7-£ ), where e > 0 is arbitrary and 7 is the degree of the 
Stein-Lipschitz norm of the vector fields of the original SDE. In the present case of (I7.2[) . the 
vector fields are polynomial and thus 7 can be chosen to be arbitrarily large. Therefore the 
Wasserstcin rate is arbitrarily close to the optimal rate of convergence (up to a logarithmic 
factor). We make the disclaimer that while this argument is true from a theoretical point of 
view, increasing 7 will cause a corresponding exponential increase of the constant in the RDE 
Lipschitz-estimate needed to prove Theorem 11.11 

Next, we focus on establishing Holder bounds for our original coupling of (W h , H h ) as 
provided by Proposition 16.21 First it can be shown that the commmon Erst level satisfies 

UNCOIL = Hhpy,)iL = cm i* - A 2 - 

Similarly, | ||vr 2 (W(f t ) ||\ Lq , 1117r 2 (X^ t ) 111 L? < C(q)\t — s\ by scaling. Thus a standard Kol¬ 
mogorov regularity result (|3B, Theorem A.12]) implies that for all a G [0, |), 

|Pa-H 61 ;[ 0 ,l](W h )| L9 , |/VH 61 ;[ 0 , 1 ] 0^) \ L q < C = C (q). 

We now consider the Holder norm between W h and X /( . Using Proposition 16.21 with the dyadic 
singleton set E = {jh} gives 

| IK (Kt - Kt) II Lw <C\t-s\h~ 1 h = C\t-s\, V[s. t] C \jh, (] + I)*]. (7.3) 

Similarly, for larger increments |t — s| > h our coupling ensures that 

lib (Wj,« - Kt) llbfl < »« IIIte - A E ||| iW < Cftlogf/r 1 ). (7.4) 

2 m -[s,t]n{6,l,.’..,2 m -l}CE 


Thus, 

lib ( w ?,< - Kt) lllw < c (h\ogih- 1 ) a 1 1- «i). 

Certainly the stochastic processes 717 (W h ) = 7 r 1 (X ft ) and 7 t 2 ( W h ) take their values in the 1st and 
2nd inhomogeneous Wiener chaos C 1 (P), C 2 (P) respectively, f |36i Proposition 15.20]). Recall 
the decompositon (15. If) : 

b h = 4 J)w i (j) ~ z ? W k ] + X kl 

That is, each B^A is a quadratic polynomial of Gaussian random variables, (albeit with a 
complicated covariance structure with respect to the {A ^}^ 1 increments). Moreover, 

(xy u+1) ») = r,_ (wy fa . +1)t ) - A<i> + BW. 

Therefore 7 r 2 (X h ) also takes values in the 2nd inhomogeneous Wiener chaos C 2 (P). Combining 
the equivalence of L 9 -norms on inhomogeneous Wiener chaos space (specifically Proposition 
15.25]) with Proposition 16.21 yields the following difference estimate. 

Proposition 7.4. There exists a constant C > 0 such that for all q > 1, we have 


(W‘, 



< Cq (h log(h x ) A |t — s|) . 
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For further details on the integrability of Wiener chaos expansions we refer to [331 Propo¬ 
sition 3], [36l Exercise 13.6, Theorem D. 8 ], and [74]. 


Lemma 7.5. For all q > 1, there exists a constant C = C(q) such that for all a G [0, |) we 
have 

| Pct-Hol;[0,1] (W h ,X h )\ L? < C (h Mh" 1 )) ^ . 

Proof. Since W /l and X /l share a common first level, 

, . (2) , . ,, lh(w?,, - xy | 

/Vm#,!] (W\ X*) = p 1 2) H8 , !|0iI| (W\ X») = sup 


\t — S 


,2a 


0<s<t<l 

Combining (17.31) and (17.4j) guarantees that for every 9 G [0,1], 

llh (w;, ( - xy ||| t „ < c (feiogfv 1 ) a it - «|) < c (fcbgfr 1 )) 1 ^ \t - s \ 9 . 

Appealing to a standard Kolomogorov result for rough paths ( [361 Theorem A. 13]), we arrive 
at the claim. □ 


For our approximation H h of W h to be of any use in RDE Lipschitz estimates, we need the 
previous quantity ( h log^r -1 )) 1-0 to be less than 0(\/h), or else we might as well have used the 
level-1 log-ODE method which has order 0(\/h), and in common with X^, neither needs Levy 
area increments, ffowever this requires that 6 < which in turn demands that a G [0, |). 
Unfortunately for a < p Q _H6i(-) is no longer a rough path metric for 2-rough paths. Therefore 

to make the Holder bound of Lemma 17.51 useful at all we would need to not only compute but 
control the p- variation, (where p = a -1 > 4), of at least the first 4 levels of the lifts S K (W h ) 
and S K (H h ), (k > 4), and the corresponding difference at each tensor level: 


max 

k= 2,...,ac 


vr* {S K (W h ) s , t - S K (X h ) s , t ) 


Li 


This is precisely what we will establish in the next section. Importantly, we know from Remark 
14.71 that lifting the piecewise abelian rough paths W h , H h does not change the RDE defining 
the log-ODE method and our approximation scheme. 

Remark 7.6. This technique of lifting the rough paths W h and X h to higher levels k > 2 in 
order to get a better p-variation distance bound is inspired by the same technique for dealing 
with the convergence rates of Gaussian rough paths by Friz, Riedel and Xu in [331 m (see 
Remark 5.2 in the latter and the first remark after Corollary 1 of the former paper). The 
basic idea is that by lifting the driving rough path to a higher level, p is allowed to be larger, 
and hence a better estimate may be hoped for. In order for the RDE to still be well-defined 
and have a unique solution, we are penalised by requiring that the Stein-Lipschitz order of 
the vector fields is of order 7 > p. In their particular application, the previous authors recover 
(almost) optimal rates of convergence for Wong-Zakai approximations of RDEs driven by various 
Gaussian rough path lifts, including fractional Brownian motion with Hurst index H G (|, 1], 
(see the final remark after Corollary 1 of [33]). 


8 Lifted rough path estimates 

The aim of this section is to prove that the lifting trick discussed in the previous section will 
actually work and give the inhomogeneous p-variation estimates needed to establish Corollary 
19.11 in the next section. This is the content of the final result of this section, Theorem 18.61 
As our first step in proving Theorem 18.61 we consider an error estimate for our lifted rough 
path coupling. 
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Proposition 8.1. Fix an integer k > 2. Then for all q > 1 there exists a constant C = C(k, q) 
such that for each integer 2 < m < k: 


7T m (S K (W%-S K (X%)|| 


< C |t — s\ 2 1 (h\og{h x ) A |£ — s|) Vs<te[0,1]. 


Before giving the proof we present some useful technical results, beginning with the following 
lemma (cf. §3 of |58]). 

Lemma 8.2. Fix an integer k > 2, p G (2, 3) and suppose X G C l ^ p ~ noi ([0,1], G^ 2 ^(M d )). Then 
there exists a constant C = C(n,p) such that for all k — 1,..., k, 


7T k 


(log S K (X) s t ) 




Proof. The function 

9 e G (,s) (R d ) m E\ ax K I km (togs)!!^**, 

is a homogeneous norm on G^(R d ) (cf. Exercise 7.38 of [36]). Therefore by the equivalence of 
such norms on (7^(1^), ([36], Theorem 7.44]), there exists a constant C = C(k) > 1 such that 

/ \ k 


^Tfc (log ^(X)^) || < 


max \\iTm (logS’ K (X) Sit )|| 1/m 

m= 


< C ||-S'«(X) Sit ||^ 


<^II^(X)|| 


k 

p-var;[s,£] * 


The Lipschitz-continuity of the rough path lift in p-variation guarantees the existence of a con¬ 
stant C = C(k,p) such that ||S' K (X)|| pvar .j s ^ < C ||X|| p var .j s t y Moreover, a simple consequence 

of the super-additivity of controls (cf. [30, §8.1]) gives ||X|| p _ var;M < \t - s| 1/p ||X|| 1/p _ H61;M . 
Putting the last three inequalities together completes the proof. □ 


We apply Lemma II 1.31 with 

x j+1 = = W (j) + A u \ y j+1 = r] (j) = W {j) + B u \ for j = 0,1,..., N - 1. 

Then exploiting the compatibility of the tensor algebra norm, we have for all m < k\ 

|Ms«(w'‘) 0 ,„ k - s.(xv») ||, 


( 8 . 1 ) 


11 TTm. 


XI 


e Xn - e yi 


\ 


< 


Xh X 

k =1 Ji,...,ifc>0 

ii+...+ik=m 


9i x ® ® <8> (#q - hq) <8> h 

3 = 1 


l 3 + 1 


hi 


'I'k 


-Hu J2 J2 bn ® • • • 8) 9ij-i II 11 9ij - hi 


k\ 

k= 1 ii,...,ifc>0 3=1 

h+...+ik=m 


l i +1 


h 


1k 


where 


9i : = *i . . . , X n )) = 7 Ti (log S K ( W h )o,n/x) , 

hi : = 7 n (H(yi ,..., y n )) = 7q (log S K (X h ) 0fnh ) G g (k) (R d ). 


Since the increments of W ft and X ft agree, g\ — hi = 0 and 

n —1 

92~h 2 = ~ bU) ) e P 8 

3=0 


d TD>^1 
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Moreover, by Lemma 18.2[ given p G (2,3), there exists some constant C = C(p ) such that 

Il9<ll<*)« = IN (logS K (W‘) 0M )|| (RJ)8 , < C(nhf> ||W'*|r i/j ^ Hsl;(0 „ AJ 

with a similar bound for hi with X h . Therefore the previous inequality becomes 

IK (S K {W h ) 0M - S K (X h ) 0>nh ) || m (8.2) 


E E( n/i ) s 

k= 1 ‘ n,...,ifc>0 j=l 

ii+...+ifc=m 


( m—ij) 


\w h 


^(u+— -Hj-i) i||| ^(q+i +—+»fc) 


I l/p-H61;[0,n/i] 


ii* 


I l/p-Hol;[0,n/i] 


Ik- - h 


fc=l *i>—,ifc>0 j = l 

ii+...+ifc=m 




W* 


I l/p-Hol;[0,n/i] 


V \\x h 


I l/p-Hol;[0,n/i] 


Ifik hi 


So it remains to bound \\gi — /qll^rfx®, in L q for each integer 3 < i < k. This is the content 
of Lemma 18.51 below, which takes particular care in separating out the dependencies on the 
variables m and n. For the specific calculations of g m — h m for m < 5 we refer to the iterated 
Baker-Campbell-Hausdorff formula (11 l.ip found in the appendix at the end of the paper. But 
first we need the following consequence of symmetry. 

Lemma 8.3. Unless the indices (j i,..., j 2n ) € {0,1,..., N — l} 2n are present in pairs, we have 

E (\V Ul) <g>. .. ® W [j2n) | {A®, = 0. 


Proof. Suppose j t is without a pair in {ji,..., jb—l, , • • •, J 2 n} = : /, (that is, j t I). Then 

setting ct(A, B ) := {A^\ B (:1> : 0 < j < N — lj, the tower property of conditional expectation 
gives 


E ®... ® W Ui) ® ... ® W {j2n) \cr(A, B)) 

= E (w {jl) ® ... ® E (\V Ui) \a(A , B), {W {jk) } k ^ ® LF (ii+l) ® ... ® hF (i 2 " ) |a(^>#)) • 


There exists a matrix 
that 


Z^) G 1 ^ xd with entries taking values in {±z^\ ±CfcKfc=i 


/ ,4(ji) 


= : + 


I(Ui) 

A(k) 


J 


such 


where A&l = (^ Ay ^,..., K-i <f) e b and B^'\ K^ l \ defined similarly. By symme¬ 
try we can change the sign of all entries on the right-hand side without changing the law of 
(A^\ B^y- that is, 



(8.3) 


Recalling that {z^\ ( (j \ K^\ are independent of all Brownian increments {W^}^ = q 1 , 

it follows from (j8.3f) that: 

P (wM > x\a(A,B), {W Uk) } k ^ = P ( W^ ji) < -x\a{A,B), {W Uk) } k ^ for all x > 0, 
for any given coordinate c G {1,..., d}. This symmetry of the conditional density yields 

(fi,E(w^\a(A,B), {VF^}^)) = E 

= E (w^\a(A,B), {W™}^ = 0, 
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and so 


E (W [h) <g> W Ui) <g> ... <g> W ihn) \a(A, B )) 

= E ( W Ul) ® ... ® 0 ® PU 0i+l) <g> ... ® W U2n) \a{A, B )) = 0. 


The proof is complete. 

For convenience, let us define the associative operation * on gwd(M d ) by 


□ 


x 1 *x 2 *---*x k _ 1 *x k : = [xi, [x 2 , [. • •, [x k _i,x k ]]]\. (8.4) 

Lemma 8.4. There exists a universal constant C = C(m ) such that for every x±,...,x k G 
g( m )(M d ), we have 


||7T m {xi * • • • * Xfc)||( M d)®m < C ^2 \( e *ji ® ® e *j m ,Km {xi (g) . . . ®X fc ))| . 

vj jm^d 

Proof. If k > m then the inequality is trivial: both sides are zero. So we suppose that k < m. 
Unfolding the nested Lie brackets of X\ * • • -*x k , there will be at most 2 fc (< 2 m ) non-zero terms 
in the resultant expansion, with each term taking the form 

±x a{1) <g)av( 2 ) <8> ■ ■ ■ ® a;^), 

for some a G S k , where S k is the symmetric group of permutations on {1,..., k} (see Lemma 
111.41) . In other words, every Xi will be present exactly once in each term in the expansion of 
xi* ■■■* x k . By symmetry, H^i) ® ® x a{k ) || (Rd)8m = ||xi ® ® x fc || (Rd)0m . Since k < m, 

it follows that there exists a constant C = C(m ) such that 

||tT?71 ( 3^1 * ■ ■ ■ * Xk) || ^ C || 7 T m (X\ $5 • • • $5 Xk) || ^dj<8m 

= c ^2 \{ e *h <8> ■ ■ ■ ® e* m ,7T m (x 1 <8 )... (8) Xk))\ 2 . 

Using the equivalence of norms on finite-dimensional vectors spaces (in particular, ||a|| /2 < ||a|| ; i 
for a G W) yields 

W^m {xi ® .. ■ <g> < ^2 |(e*i <8> ■ ■ ■ <B> e* m ,7T m (xj. ® . . . <8> Xfc))| . 


The claim follows. 


□ 


Lemma 8.5. Fix q > 1, an integer n > 1, and for m — 2, ..., k define: 

g m = n m (log S K (W h )o in h) , h m = n m (log S K (X h ) 0 , nh ) G g (m) (R d ). 
Then for each m, there exists a constant C m = C m («, q) such that 


|| 9 m h"n 


<C m (nh )™ l h\og(h x ). 

Li 


(8.5) 


Proof. Without loss of generality we may assume q = 2r for some integer r > 1. The case 
of m = 2 is precisely the content of our coupling result in Proposition 16.21 so we restrict the 
proof to 3 < m < k. We define x { , y t as in (18. ip . Let r k : g^(M d ) — * g( m )(M d ) denote the 
canonical projection onto nested Lie brackets terms of length k. Note that r k (g m ~ h m ) — 0 for 
all k > m. Indeed, for k > m this is a consequence of truncation, while in the case of k = m, 
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the nested Lie brackets will only consist of increment terms, which are common to both 
expressions. Using Lemma 111.41 for rearranging nested Lie brackets into brackets of the form 
of 08.4p if needed, the Baker-Campbcll-Hausdorff formula gives index sets A C {1,..., N — l} fc 
and corresponding coefficients c\ such that 

Tk(g m )=Y Cx Y K m (xi 1 *---*x ik ). 

AgAj. pi,.~,ifc)sA 

Setting Zi := ay — = A (® -1 ' — B h _1 ), we can exploit the property that a nested bracket of 

length A: is a fc-multilinear map and employ a telescoping sum to find that 


'Tki^Qm h"m) 

= Y Cx Y T T m (x h *---*x ik -y il *---*y ik ) 

AgAj. (ii,...,ij.)eA 

= Y Cx Y nm ( Zil * x i 2 * ■ ■ ■ * X ik+Vh* z h* x i 3 * ■ ■ ■ * x ik + ■ ■- + yii* ■ ■ ■ * Ui k -1 * z ik ) 

AeAfc 

k 

= YY Cx Y n rn {Vh * ■ ■ ■ * Vir-x * z i r * X ir+1 * ■ ■ ■ * X ik ) . 

r= l AGAfc (u,'",*fc)GA 


Note that each nested Lie bracket term in the difference Tk{g m — h m ) will contain N± increment 
terms (that is, W^) and N 2 area terms (either A^\ B W) such that N± + N 2 = k and Ni + 2N 2 = 
m. By changing the order of summation if necessary, for each (r, A) pair there exist index sets 
A r and A(the latter consisting of necessarily consecutive elements), such that 


Y ^ ( Vil * " ' * Vir-l * Zi r * Xi r+1 * ' ' ' * X ik) 

(ip ,...,z/ c )gA 

= Y Y^^ l*-"*yir-l* Z ir* X ir+l*---* X ik) 

I={il,— ,V-l,V+l,..- 4 fc)SA r 1 gA i 


E 


IX n 


yi!*" ■* Vir-i 


I —( i \—|_i ,...,ik)£\ r 


E 

VJ6AJ 


Z j * x ir +1 *---* X i k 


( 8 . 6 ) 


We can always choose A^ to be made up of consecutive elements because the induction proof of 
the iterated Baker-Campbell-Hausdorff formula (111.11) and the bilinearity of nested Lie brackets 
guarantee that for any fixed indices ii, ..., i r ~ 1 , i r + 1 ,..., ik G {1,..., n}, the set 


{j ■ (^1) ■ • • J V— 1; j-t ir+1 1 ■ ■ ■ J ^fc) G A} 


is either empty or made up of consecutive elements. 
Since the indices of A^ are consecutive, 


E z i = E A 3 " 1 ’ - B<i ~ 1) ) = ** (w‘, - xy, 

AJ jSAJ 


for some pair (s,t) = (riih,n 2 h) where rii,n 2 G {0,1,... ,1V}. Applying Proposition 17.41 to the 
right-hand side yields the estimate 


Y Z i 


16 A" 

(Rd)®2 


<C{q)hlog(h x ), 


(8.7) 
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Alternatively, we could have argued that for every set of fixed indices A,..., i r -i,i r +i, ■ ■ ■ Ok G 
{1,..., n}, the set 

{j • (fl> ■ ■ • j ir— 1) ji V+l) ■ ■ • j i'h) ^ ^} 

can be decomposed into at most k unique sets, each composed of consecutive elements them¬ 
selves. Indeed, the other k — 1 indices impose at most k restrictions on the free index j (see the 
iterated formula (111.11) '). Since k < m, it follows that the constant C(q ) in (18. 7|) would have to 
be replaced by C(q,m), (that is a constant dependent on both q and m). 

Set Z to be the projection of (18.61) onto an arbitrary coordinate (ci,..., c m ) of (R d )® m : 


a ® • • • ® e* m , ^ n rn \vh ® • • • ® Uir—i ® I E Zj | ® ^i r+ i ® • ■ • ® X tfc 

...,ir-l,ir+l, — ,ik)EX r V 

and for / = (A,..., f r _i, i r+ 1 ,. ..,4 ) e A r define 


Pi ■= ( e; 


Cl 


% I ® ah r+1 <8> ... ® X ik 


e lnl n m I Vn ® • • • ® Vir-x ® X] 

By the equivalence of norms given by Lemma 18.41 it suffices to prove that 

|Z| L , < C(nh)^ _1 (hlog(/r -1 )). 


in order to establish (18.5[) . To this end, note that since q = 2r for some integer r, we have 


19 
I Li 


= I z 


2 r 
L 2r 


= E 



= E ' ® (Ph ■ ■ ■ Phr) ■ 

Ii£\ r hr<Z\ r 


The qN\ = 2rN\ increment terms in the expression for \Z\ q Lq must be paired by Lemma 18.31 
and the tower property of conditional expectation. There are (qNi — 1)!! < C(m,q) possible 
perfect matchings, (where N\\ := for N = 2k + 1). Each matching is summed over at most 
n possible indices and has scaling 0(h) in L p (for all p > 1). The remaining ( qN 2 — 2) area 
sums are taken over at most n indices each, with each constituent area term having scaling 
0(h) in L p . By combining these observations with (18.7|) and the Cauchy-Schwarz inequality, 
we conclude that 


Z\% r < C(nh) rNl (nh) 2r ^ N ’ 2 ~ 1 \h log(h~ 1 )) 2r = C(nh) r{Nl+2N2 - 2 \h\og(h~ 1 )) 2r 

= C[nh) r ^ rn ~ 2 \h log(h _1 )) 2r 


and the result follows. 


□ 


Proof of Proposition 1 8. il For \t — s| < h the result is immediate from scaling so we restrict 
ourselves to \t — s| > h. Since W h and X /l are both piecewise abelian over each interval 
[jh, (j + 1 )h\ it suffices to prove the claim for (s,t) = (0 , nh). 

To this end we combine (18.2|) with the estimates of Lemma 18.51 using the Cauchy-Schwarz 
inequality and fold 


\^m (S K (yk h )o, n h ~ S K (X h ) 0 , n h) || (Rd)S 


e ew 5 

k =1 j=1 

il + ...+i k =m 


( m—ij) 


Li 


\w h 


I l/p-Hol;[0,n/i] 


L 2 i 


+ 


||x ft 


i m—ij 

I l/p-Hol;[0, nh] 


L 2 i 


- C 2 Y.p\ E Ew 

k =1 u,..-!*fc>0 3 = 1 

ii+...+i k =m 


Pm-ip+f -1 


h log(/r 


-i \ 
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The last line follows uses the integrability of the random quantities 

exp (a ||W h ||^ Ha;[0>1] ) , exp (a |K||* /p _ H51;[01] ) 
for some a > 0 in any given ZA-norm (cf. Corollary 13.14 of |36j). Since nh < Nh — 1, we have 

m k 

h m (S K (W\ nk -S K (X l ') 0>nk )\\ ( ^<C 2 J2-g Y. J>V) ¥-1 Mog (ft" 1 ) 

k =1 u,...,i fc >0 j=l 

il+...+i k =m 

< C 3 {nh)^~ l h\og(h~ l ), 

for some constant C 3 = C 3 (m, hi) > 0. The proof is complete. □ 

We are now in a position to prove the main result of this section, namely the claimed lifted 
rough path estimates. 

Theorem 8.6. Fix an integer n > 2 and set p > hi such that [pj = ft. Then for all q > 1, 
there exists a constant C = C(n,p, q ) such that for for all e > 0, 

|Pl/p-H61;[0,l] (*S K (W /l ),5 K (X ?l ))| L9 < Ch 1 - 2 ^- £ 

Proof. Fix 9 G [0,1] and note that 

1 1 — s | T_1 (/?. log(/r _1 ) A 1 1 — s|) < (/ilog(/i _1 )) 1 “ e> 1 1 — s | T_1+0 < (/tlog(/i _1 )) 1_e 1 1 — s|^“ , 

since 'ff — 1 + 6 > ^ for m > 2. Then combining the fact that 77 (W(( t — X^ t ) = 0 with 
Proposition 18.11 guarantees that 

< C(/i log (/i -1 )) 1-61 \t — s|~ . 


max 

m= 


7[ rt 


{S K ( W h ) 


s,t 


s K 


I s,t 


Appealing to the rough path Kolmogorov regularity theorem of [36, Theorem A. 13], it follows 
that for all 7 < | we have 


max 

m= 


sup 

0<s<t<l 


MS K (W%-S K (X%)|| 


1 1 


17717 


< Cihlogih- 1 )) 1 - 9 . 

Li 


Therefore so long as p = 7 1 > |, 


Pl/ P -H61;[0,l] (S K (W' l ),S K (X' l ))| w < Cilllogih- 1 )) 1 - 9 . 


The condition p > | is satisfied by taking 9 = ^ — 6 for some small constant £ > 0. Since this 
latter constant is arbitrary, we may drop the log(/r _1 ) factor from our bound and the result 
follows. □ 


Remark 8.7. Although Theorem 18.61 requires that p > 2, as a quick heuristic and sanity check 
we note that if p < 2 then the right-hand side of its bound explodes as N = h —>• 00 , (as was 
indicated by our previous calculations in Section [7]). Moreover, if p G [2, 4) then the convergence 
rate is worse than 0{y/h), as predicted. 
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9 Application to coupling SDE solutions 

We immediately put to work the inhomogeneous p -variation estimate of Theorem 18.61 in order 
to derive Wasserstcin convergence rates between the solutions of the previous RDEs (with 
drift) driven by W h and X /l (or, equivalently, their rough path lifts). Recall that the solutions 
coincide with the approximation scheme {xj }f = i and the output of the log-ODE method. By 
coupling the rough paths 'W h , X h , (or more precisely their lifts), the following result is an almost 
immediate corollary of the Lipschitz-continuity of the Ito map. In turn the following corollary 
immediately implies the main result of the paper, Theorem 11.11 

Corollary 9.1. Let x denote the solution to the SDE hS. 1\) where we assume that Vq G Lip^M 9 ) 
and V = {V k } d k=1 G Lip 7 (M <? ) for 7 > p > 2. Let y be the solution to the following RDE driven 
byX h : 

dy t = V (y t )d&t +V 0 (y t )dt, y 0 = x 0 eR q . (9.1) 

Then there exists a constant C = C(p, 7 , ||V|| Li 7 ) such that for all e > 0, 

max || x(jh) - y (jh) \| R , < Ch x ~ 2/p ~ e . 

j=h-,N L2 

Proof. Let k > 2 be the unique integer satisfying [p\ = k. Recalling Remark 14.71 we can 
rewrite the RDE (19.ip as 

dy t = V(y t ) dS K (X h ) t + V 0 (y t ) dt , y 0 = x 0 G ML 

A corresponding statement holds for the pair W h , S K (W h ): let 0 be the solution to the equivalent 
RDEs with drift: 


dz t = V(z t ) dWt + V 0 (zt) dt = V(z t ) dS K (W h ) t + V 0 (z t ) dt, z 0 = x 0 . 
From Theorem 13.21 and Proposition 14.51 we already have 


max^ || x(jh) -z(jh) ||, 


< Ch, 


L 2 


and so it suffices to prove |||r/ — A Ch l ~ 2 ^ p ~ e . 

To this end, we employ the RDE Lipschitz estimate of Proposition 12.51 to find that for each 

a > 0 , 

II V Z|L — Pp-var;[0,l] {l/i %) 

< (s K (w 1 ). (x ft )) (1 + ||s« (W‘) tLkn + lk« (X") tLkn) 

• exp (Ci {1 + N aiP (S K (’ W h ) , [0,1]) + N a , p (S K (X h ) , [0,1])}) (9.2) 

for some (deterministic) constant C\ = Ci(p, 7, ||R || Lip7 , a). By the Lipschitz-continuity of 
the rough path lift and the fact that W h is the piecewise abelian approximation of enhanced 
Brownian motion W, 

l|s«<w*)IUwi - Il s '-( W '‘)II( 2+S ,.v„ ;l 0, 1 ] £ ci(K, S) ||w'-|| (2+s>vati|0il] < ||w|| (1M) . WiM , 

( 9 - 3 ) 

for arbitrary 6 > 0. The last quantity is integrable in L q for all q > 1 f [36i. Corollary 13.14]) 
with an identical statement holding for S K (X h ). 

Next we establish the integrability of the exponential term in (19.2p by following the proofs of 
Theorem 10 and 13 in [9| along with [321, Corollary 5]. First let l/r + 1/s > 1, where r G (2, 3). 
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Using the results of [15] Remark 6.4] and [321 Corollary 2 and Remark 1], there exist constants 
a = a(r, s) > 0 and C\ = Ci(r, s) > 0 such that the tail estimate: 

P (N a , r (W, [0,1]) > u) < exp (-ci a 2/r u 2/s ) 

holds for all u > 0 and step-sizes h > 0. Since ||W /! || ? , var ^ ^ < ||W|| r _ var .[ s t j, Lemma 2 of [32j 
gives N aj p(W h , [0,1]) < N a}r ( W, [0,1]). Hence, 

p {N a ,r [ 0 , 1 ]) > u) < P (iV Q , r (W, [ 0 ,1]) > u). 

Then the Lipschitz-continuity of S K (-) and [32], Lemma 2] imply 

P {N a , r {S K (W h ), [0,1]) > u) < P (N a , r (W h , [0,1]) > u) < exp (- Cl a 2/r u 2/s ) 

also holds for all u > 0, h > 0, and for a possibly smaller a > 0 (as in the proof of Theorem 13 
of 0 ). The interpolation result of Lemma 8.16 of [36] tells us that 

Wi(s,f) := ||^(W h )||" < u 2 (s,t) ■ sup || 5 (t (W A ) UiV ||Jr r , 


where ui 2 ( s it) ■— ||*S' K (W h ) ||(, var .j s t y Thus taking u < v £ [s,t] such that uj 2 {u,v) < cc, we then 
have uji(u,v) < a p ~ r u 2 (u,v). Again appealing to [32, Lemma 2] yields 

N aP -r atP (S K (W h ), [s,t]) < N a , r ( S K (W h ), [s,t]) . 

If a < 1 then N a , p (S K ( W h ), [0,1]) < N a:1 . ( S K (W h ), [0,1]), and so 

P (N atP (S K ( W h ), [ 0 , 1 ]) > u) < exp (- Cl a 2 / r u 2 / s ) . (9.4) 

On the other hand, if a > 1 then Lemma 3 of [32] gives 

N a , p (S K ( W h ), [0,1]) < (1 + 2N aP -r ap (S K (W h ), [0,1])) < a p ~ r (l + 2 N a , r (S K ( W h ), [0,1])) 

which in turn gives the tail estimate: 




P {N a , p (S K (W h ), [0,1]) > u) < c 2 exp (- c 3 a 2/r+2(r - p)/s u 2/s ) (9.5) 

for some constants c % = C{ (r, s, a) > 0, i = 1, 2. In either case, the tail estimates (j9.4[) and (19.51) . 
together with the fact that s G (1,2), guarantee that for all q > 1, 

sup |exp (CiN a>p (S K ( W h ), [0,1])) L, <03 = C 3 (k, q, a) < oo. 

h> 0 


By an identical argument, corresponding uniform estimates with S K (X h ) instead of S K (W h ) can 
be established. Consequently for all q > 1, 

sup |exp (C, {1 + N a , p (,S K (W h ), [0,1]) + N a , p (S K (X h ), [0,1])})| < 0 3 = C 3 {k, q, a) < oo. 

h> 0 

(9.6) 

The proof is concluded by applying the Cauchy-Schwarz inequality to (I9.2[) and then using the 
estimates provided by (19.3j) , (j9.6|) and Theorem 18.61 □ 


10 Concluding remarks 

We discuss the possibility of two extensions of Theorem 11.11 
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10.1 Fractional Brownian motion 


A natural question to ask is whether Theorem 11.11 can be generalised to the case where the 
driving Brownian motion of the SDE (13.ip is replaced with a fractional Brownian motion with 
Hurst index H e (|, |), (turning (13.lj) into a RDE). Recall that Brownian motion corresponds 
to H — The log-ODE method remains perfectly valid for this RDE, as does our piecewise 
abelian interpretation since the Levy area of fractional Brownian motion is well-defined for 
H > \ (This is not the case for H < j [ 66 ] §1]; even the lift of the standard piecewise 
linear interpolation does not converge in p -variation under L 1 for p > H > 4 [53] §4.5]). 
Instead, the main obstacle of this extension is the need to reproduce the main coupling result 
of Proposition 16.21 This is problematic because the proof of Davie in [23] inherently relies upon 
the independence of increments in the Brownian motion case in order to perform the inductive 
coupling over finer dyadic intervals of [0,1]. Similarly, the original Komlos-Major-Tusnady 
Theorem and the modern extensions of Zaitsev also critically rely upon the independence of 
increments of the random walk to be coupled with a Gaussian approximation. Since this is no 
longer the case when H the authors see no way around this at present. In the even more 
extreme case of H e (|, |], for the log-ODE method to converge we require the first \_H l J = 3 
levels of the log-signature. The task of coupling these higher order terms is difficult, even in 
the Brownian case (as we now discuss). 

10.2 Higher order log-ODE approximations 

This paper has dealt exclusively with the log-ODE method at level m — 2; that is, the log- 
signature of the Brownian motion has been truncated to its first two levels. Thus we have only 
needed to couple the Levy area increments with a Gaussian approximation, (conditional on 
the underlying Brownian increments). A natural extension would be to couple the higher order 
terms of the log-signature with Gaussian approximations, thus enabling us to use a higher order 
version of the log-ODE method. If this coupling were successful, then we could expect a better 
convergence rate in the Wasserstein metric for our resultant approximation scheme. Indeed, a 
truncation of the log-signature to level m produces a log-ODE scheme with strong convergence 
in L 2 of order 0(h m / 2 ) [41] Theorem 4.1]. 

One difficulty is that one would need to extend the proof of Proposition [672]to couple not only 
the Levy area increments, but also the third iterated integrals, conditional on the underlying 
Brownian increments. This may not be possible without violating the matrix non-degeneracy 
conditions needed for Davie’s coupling proof [231 Theorem 1], A more pronounced obstacle is 
the task of establishing the necessary lifted rough path estimates of Section HJ The difference of 
the iterated Baker-Campbell-Hausdorff expansions of each piecewise abelian rough path would 
be significantly more complex because levels 2 and 3 of the group increments would not be 
equal. The authors see no solution at present. 


Appendix: Iterated Baker-Campbell-Hausdorff formula 

The Baker-Campbell-Hausdorff formula links the structure of a Lie group with the correspond¬ 
ing structure on its associated Lie algebra. It does this by expressing the logarithm of the 
product of two Lie group elements as a Lie algebra element using only Lie algebraic operations. 

Theorem 11.1 (Baker-Campbell-Hausdorff formula). Let Q be a Lie group with group product 
o and the corresponding Lie algebra 0 defined over any field of characteristic 0. Let exp : g —» Q 
be the exponential map. Then for every pair x, y G g. 

H(x, y) : = log (exp(x) o exp(y)), 
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can be written as a formal infinite sum of elements of g. The first terms of order less than or 
equal to 5 are given by 

H(x,y) = x + y + ^[x,y] + ([x, [x,y]\ + [y, [y,x]]) - ^[y, [x, [x,y]]] 

- ([[[fo y],y],y],y] + [[[[?/, xfxfxfx]) 

+ ([[[[®, y],y], y}i x \ + [[[[y,x},x},x],y]) 

+ ^ ([[[b> x\,y\,x\,y} + [[[[x, y],x\,y\,x]) + ... 

We can also consider the iterated product: 

H(x i,..., x n ) := log (e Xl o ... o e Xn ) . 

This is the iterated version of the Baker-Campbell-Hausdorff formula which is used in the proof 
of the Chen-Strichartz development formula, where the latter gives an explicit expression for 
the logarithm of the Brownian signature lift logSjv(W) S)t ([201 ITS]). 

Theorem 11.2 (Iterated Baker-Campbell-Hausdorff-Formula). The iterated Hausdorff coeffi¬ 
cient has the form: 

k =1 P&B k • 

where B k , P, P\ and X p are given by the following expressions: 

B k = | (Pi)ie{ P\ e N : Vj e {1,..., k} , ^p? > 0 

l je{i,...,k} 4=i 

k n k n 

t> - p!= nn^ ! ’ 

j=1 i =1 j =1 

= Jxi... [xi . [x w/ • . Jxi ■ • ■ [xi . ■ ■ Jx n ■ ■ • jx n , x n ) . ..]. 

Pi times p* times p^ times p^ times 

We refer to [H Appendix B], [T?] §3.2] and [72[ Theorem 3.11] for further details. By 
induction it can be shown (cf. Example 3.2 of [6j) that the first terms of the expansion up to 
nested Lie brackets of length 4 are given by 

H(x x„) (11.1) 

n 1 

= J2 Xi+ 2 [^^i] 

2=1 l< 2 <ji'<n 

+ | X Xj],x k ] + — J2 X + E ^ 

\<i<j<k<n i,j k>i\fj 1 <i<j<n 

+ ^ X [[i x ^xj\, Xk \,xi] + X [[ x i’[ x j> x k]], x i\ + ^ X [[xjAxj^iW^k] 

l<i<j<k<l<n i,j l>k>iVj 1 <i<j<k<n 

+ 7^ X ^ ^ ^ X ill] “ ^ X X [ Xfc ’ fo’ Xk HI + • • • 

l<i<j<k<n i,j k>iWj 

We now specialise to the case of (Q, o, g) = (G^(M d ), <g), g^(R d )) and present a useful technical 
lemma for expressing the difference of two iterated BCHF expansions as the global difference 
at the Lie algebra, rather than the local difference at the Lie group level. 



38 









Lemma 11.3. Fix sequences {xj} r f =1 ,{yj} r f =1 G G^(M d ) and set 

gi = 7T ? : (#(zi,.. .,x n )) , hi = 7Ti (H(y u .. .,y n )) E 0 (K) (M d ). 
Then for every integer m < k, 

||7r m (e Xl <g).. 


e 1 " - e yi 


3 i;»' 


< 


Em E 

k =1 

ii+...+4=m 


k 

l=i 


® (<y, ; - hi.) <g> h, 


y+i 


h. 




Proof. Recall the well-known non-commutative identity (cf. [7, §4]), 

n n n j — 1 n 

® a i ~ ® b i = S ® a i( a j ~ bj) (^) 6 j, 
l=i l=i l=i *=i *=l+i 


( 11 . 2 ) 


for any sequences {dj},{bj}, with the convention that <g)° =1 aj = 1. It follows that for every 
positive integer m, 


| (c 




e Xn - e yi 


3 j»' 


= || Ti m — e H( ' yi '"' ,yn ' > 


)l 


( Em ■ ■ ■ ,*»)** - - 


, fc =0 


< IK™ (#( X 1> • • • i - #(j/l> • • • i 2/n) 


®fc 


fc=l 


yi 

^ k\ 

k =1 


£ Oi ® • • • ® - h h ® ... ® h ik ) 


ili-A>0 

ii+...+»*=m 


< 


Em E 

fc=l ii >0 

il+---+ifc=m 


k 

I]-9*i 
1=1 


5E_i ® (&, - hj .) <g> h. 


ij +1 


h. 


*/c 


The proof is complete. □ 

We conclude the appendix with a useful technical lemma (see [6] Lemma 3.1]). 

Lemma 11.4. Any Lie bracket of elements x\,... ,x n E g is a linear combination with coeffi¬ 
cients of ±1 of nested commutators of the form 

x h *x i2 * ...* x^ * x in := [x h , [x i2 , [... [xi^^XiJ}}}. 

Proof. The proof relies upon the Jacobi identity and induction on the length of the commutator. 
The induction basis is the identity [[aq,^], [^ 3 ,^ 4 ]] = [aq, [aq, [aq,aq]]] — [aq, [aq, [a?i, ^ 2 ]]]- D 


39 



















Acknowledgements 

The authors would like to thank Horatio Boedihardjo, Philippe Charmoy, Lajos Gyurko, Ben 
Hambly, Sean Ledger, Harald Oberhauser and Danyu Yang for many useful discussions. Special 
thanks go to Prof. Davie of Edinburgh for answering many questions about his original proof in 
[23], as well as Dr. Weijun Xu of Warwick for his help in Berlin. The research is supported by 
the European Research Council under the European Union’s Seventh Framework Programme 
(FP7-IDEAS-ERC, ERG grant agreement nr. 291244). The authors are grateful for the support 
of the Oxford-Man Institute. 


References 

[1] A. Alfonsi, B. Jourdain, and A. Kohatsu-Higa. Optimal transport bounds between the 
time-marginals of a multidimensional diffusion and its Euler scheme. arXiv preprint 
arXiv:1405.7007, 2014. 

[2] A. Alfonsi, B. Jourdain, and A. Kohatsu-Higa. Pathwise optimal transport bounds between 
a one-dimensional diffusion and its Euler scheme. The Annals of Applied Probability, 
24(3): 1049 1080, 2014. 

[3] C.J.S. Alves and A.B. Cruzeiro. Monte Carlo simulation of stochastic differential systems - 
a geometrical approach. Stochastic Processes and their Applications, 118(3):346 36T, 2008. 

[41 G. Arous. Flots et series de Taylor stochastiques. Probability Theory and Related Fields, 
81(1):29 77, 1989. 

[5] I. Bailleul. Flows driven by Banach space-valued rough paths. In Seminaire de Probabilites 
XLVI, pages 195-205. Springer, 2014. 

[6] Z. Balogh, R. Berger, R. Monti, and J. Tyson. Exceptional sets for self-similar fractals 
in Carnot groups. In Mathematical Proceedings of the Cambridge Philosophical Society, 
volume 149, pages 147-172. Cambridge Univ Press, 2010. 

[7] R.F. Bass, B.M. Hambly, and T. J. Lyons. Extending the Wong-Zakai theorem to reversible 
Markov processes. Journal of the European Mathematical Society, 4(3):237-269, 2002. 

[8] F. Baudoin. An introduction to the geometry of stochastic flows. Imperial College Press, 
2004. 

[9] C. Bayer, P. Friz, S. Riedel, and J. Schoenmakers. From rough path estimates to multilevel 
Monte Carlo. arXiv preprint arXiv:1305.5779, 2013. 

[10] J. Blanchet, X. Chen, and J. Dong. Epsilon-Strong Simulation for Multidimen¬ 
sional Stochastic Differential Equations via Rough Path Analysis. arXiv preprint 
arXiv:1403.5722, 2014. 

[11] Y. Boutaib, L.G. Gyurko, T.J. Lyons, and D. Yang. Dimension-free Euler estimates of 
rough differential equations. Rev. Roumanine Math. Pures Appl., 2013. 

[12] E. Breuillard. Local limit theorems and equidistribution of random walks on the Heisenberg 
group. Geometric and Functional Analysis, 15(1):35—82, 2005. 

[13] E. Breuillard, P. Friz, and M. Huesmann. From random walks to rough paths. Proceedings 
of the American Mathematical Society, 137(10):3487-3496, 2009. 

[14] R.J. Cameron and J.M.C. Clark. The maximum rate of convergence of discrete approxi¬ 
mations for stochastic differential equations. Springer, 1980. 

[15] T. Cass, C. Litterer, and T.J. Lyons. Integrability and tail estimates for Gaussian rough 
differential equations. The Annals of Probability, 41(4):3026-3050, 2013. 


40 


[16] T. Cass and T.J. Lyons. Evolving communities with individual preferences. Proceedings 
of the London Mathematical Society, pages 83-107, 2014. 

[17] F. Castell. Asymptotic expansion of stochastic flows. Probability theory and related fields, 
96(2):225-239, 1993. 

[18] F. Castell and J. Gaines. An efficient approximation method for stochastic differential 
equations by means of the exponential Lie series. Mathematics and computers in simula¬ 
tion, 38(1):13—19, 1995. 

[19] F. Castell and J. Gaines. The ordinary differential equation approach to asymptotically 
efficient schemes for solution of stochastic differential equations. In Annales de I’Institut 
Henri Poincare, Probabilites et Statistiques, volume 32, pages 231-250. Elsevier, 1996. 

[20] K.T. Chen. Integration of paths, geometric invariants and a generalized Baker-Hausdorff 
formula. Annals of Mathematics, pages 163-178, 1957. 

[21] A.B. Cruzeiro and P. Malliavin. Numerical approximation of diffusions in Rcl using normal 
charts of a Riemannian manifold. Stochastic processes and their applications, 116(7):1088- 
1095, 2006. 

[22] A.B. Cruzeiro, P. Malliavin, and A. Thalmaier. Geometrization of Monte-Carlo numerical 
analysis of an elliptic operator: strong approximation. Comptes Rendus Mathematique, 
338(6):481-486, 2004. 

[23] A. Davie. KMT theory applied to approximations of SDE. In Stochastic Analysis and 
Applications 2014 , pages 185-201. Springer, 2014. 

[24] A. Davie. Pathwise approximation of stochastic differential equations using coupling. 
preprint, 2014. 

[25] S. Dereich. Multilevel Monte Carlo algorithms for Levy-driven SDEs with Gaussian cor¬ 
rection. The Annals of Applied Probability, 21(1):283—311, 2011. 

[26] A. Deya, A. Neuenkirch, and S. Tindel. A Milstein-type scheme without Levy area terms 
for SDEs driven by fractional Brownian motion. In Annales de I’Institut Henri Poincare, 
Probabilites et Statistiques, volume 48, pages 518-550. Institut Henri Poincare, 2012. 

[27] A.S. Dickinson. Optimal Approximation of the Second Iterated Integral of Brownian 
Motion. Stochastic Analysis and Applications, 25(5): 1109—1128, 2007. 

[28] U. Einmahl. Extensions of results of Komlos, Major, and Tusnady to the multivariate 
case. Journal of multivariate analysis, 28(l):20-68, 1989. 

[29] D. Fliees M., Normand-Cyrot. Algebres de Lie nilpotentes, formule de Baker-Campbell- 
Hausdorff et integrals iterees de K.T. Chen. Seminaire de Probabilites LNM, 920, 1982. 

[30] P. Friz and M. Hairer. A course on rough paths. Preprint, 2014. 

[31] P. Friz and H. Oberhauser. Rough path limits of the Wong-Zakai type with a modified 
drift term. Journal of Functional Analysis, 256(10):3236-3256, 2009. 

[32] P. Friz and S. Riedel. Integrability of (non-) linear rough differential equations and inte¬ 
grals. Stochastic Analysis and Applications, 31(2):336-358, 2013. 

[33] P. Friz and S. Riedel. Convergence rates for the full Gaussian rough paths. In Annales de 
Vlnstitut Henri Poincare, Probabilites et Statistiques, volume 50, pages 154-194. Institut 
Henri Poincare, 2014. 

[34] P. Friz and N. Victoir. Approximations of the Brownian rough path with applications to 
stochastic analysis. In Annales de Vlnstitut Henri Poincare, Probabilites et Statistiques, 
volume 41, pages 703-724, 2005. 


41 



[35] P. Friz and N. Victoir. Euler estimates for rough differential equations. Journal of Differ¬ 
ential Equations, 244(2):388—412, 2008. 

[36] P. Friz and N. Victoir. Multidimensional stochastic processes as rough paths: theory and 
applications. Cambridge University Press, 2010. 

[37] J.G. Gaines and T. J. Lyons. Random generation of stochastic area integrals. SIAM Journal 
on Applied Mathematics, 54(4):1132-1146, 1994. 

[38] M. Gelbrich. Simultaneous time and chance discretization for stochastic differential equa¬ 
tions. Journal of computational and applied mathematics, 58(3):255—289, 1995. 

[39] M. Gelbrich and S. Rachev. Discretization for stochastic differential equations, Lp Wasser- 
stein metrics, and econometrical models. Lecture Notes-Monograph Series, pages 97-119, 
1996. 

[40] E. Gobet. Weak approximation of killed diffusion using Euler schemes. Stochastic processes 
and their applications, 87(2): 167-197, 2000. 

[41] L.G. Gyurko and T.J. Lyons. Rough paths based numerical algorithms in computational 
finance. Mathematics in Finance: UIMP-RSME Lluis A. Santalo Summer School, 2008. 

[42] M. Hutzenthaler, A. Jentzen, and P.E. Kloeden. Strong and weak divergence in finite 
time of Euler’s method for stochastic differential equations with non-globally Lipschitz 
continuous coefficients. Proc. R. Soc. London Ser. A, 467(2130):1563-1576, 2011. 

[43] A. Jentzen and P.E. Kloeden. Taylor approximations for stochastic partial differential 
equations, volume 83. SIAM, 2011. 

[44] S. Kanagawa. The rate of convergence for approximate solutions of stochastic differential 
equations. Tokyo J. Math, 12(1), 1989. 

[45] L.V. Kantorovich. On a problem of Monge (Russian). Uspekhi Mat. Nauk., 3:225-226, 
1948. 

[46] P.E. Kloeden and A. Neuenkirch. Convergence of numerical methods for stochastic differ¬ 
ential equations in mathematical finance. Recent Developments in Computational Finance: 
Foundations, Algorithms and Applications, Interdisciplinary Mathematical Sciences Series, 
14:49-80, 2013. 

[47] P.E. Kloeden, E. Platen, and I. Wright. The approximation of multiple stochastic integrals. 
Stochastic analysis and applications, 10(4):431-441, 1992. 

[48] J. Komlos, P. Major, and G. Tusnady. An approximation of partial sums of independent 
random variables and the sample distribution function. Zeitschrift fur Wahrscheinlichkeit- 
stheorie und verwandte Gebiete, 32(1-2):111-131, 1975. 

[49] M. Ledoux, Z. Qian, and T. Zhang. Large deviations and support theorem for diffusion 
processes via rough paths. Stochastic processes and their applications, 102(2):265-283, 
2002 . 

[50] A. Lejay and T.J. Lyons. On the importance of the Levy area for studying the limits 
of functions of converging stochastic processes. Application to homogenization. In Cur¬ 
rent trends in potential theory, volume 7. The Theta foundation/American Mathematical 
Society, 2003. 

[51] T.J. Lyons. Differential equations driven by rough signals. I. An extension of an inequality 
of L.C. Young. Math. Res. Lett, l(4):451-464, 1994. 

[52] T.J. Lyons. Differential equations driven by rough signals. Revista Matematica Iberoamer- 
icana, 14(2):215-310, 1998. 


42 



[53] T.J. Lyons. Rough paths, Signatures and the modelling of functions on streams. arXiv 
preprint arXiv:1405-4537, 2014. 

[54] T.J. Lyons, M. Caruana, and T. Levy. Differential equations driven by rough paths. Ecole 
d’ete des probabilites de saint-flour , 34:2007, 2004. 

[55] T.J. Lyons and Z. Qian. System, control and rough paths. Oxford University Press, 2002. 

[56] T.J. Lyons and N. Sidorova. Sound compression: a rough path approach. In Proceedings 
of the 4th international symposium on Information and communication technologies , pages 
223-228. Trinity College Dublin, 2005. 

[57] T.J. Lyons and N. Victoir. Cubature on Wiener space. Proceedings of the Royal Society of 
London. Series A: Mathematical, Physical and Engineering Sciences, 460(2041):169-198, 
2004. 

[58] T.J. Lyons and N. Victoir. An extension theorem to rough paths. In Annales de I’Institut 
Henri Poincare (C) Non Linear Analysis, volume 24, pages 835-847. Elsevier, 2007. 

[59] T.J. Lyons and D. Yang. Rough differential equations in Banach space driven by weak 
geometric p-rough paths. arXiv preprint arXiv:1402.2900, 2014. 

[60] T.J. Lyons and D. Yang. The theory of rough paths via one-forms and the extension of 
an argument of Schwartz to RDEs. arXiv preprint arXiv -.1503.0611'5, 2015. 

[61] W. Magnus. On the exponential solution of differential equations for a linear operator. 
Communications on pure and applied mathematics, 7(4):649-673, 1954. 

[62] F. Malrieu. Convergence to equilibrium for granular media equations and their Euler 
schemes. The Annals of Applied Probability, 13(2):540-560, 2003. 

[63] G. Maruyama. Continuous Markov processes and stochastic equations. Rendiconti del 
Circolo Matematico di Palermo, 4(l):48-90, 1955. 

[64] G.N. Milshtein. Approximate integration of stochastic differential equations (Russian). 
Teor. Veroyatnost. i Primenen, 19(3):583-588, 1974. 

[65] G. Monge. Memoire sur la theorie des deblais et des remblais. Memoires de I’Academie 
Royale des Sciences, XVIII-XIX:666-704, 1781. 

[66] A. Neuenkirch, S. Tindel, and J. Unterberger. Discretizing the fractional Levy area. 
Stochastic Processes and Their Applications, 120(2):223-254, 2010. 

[67] S. Ninomiya and N. Victoir. Weak approximation of stochastic differential equations and 
application to derivative pricing. Applied Mathematical Finance, 15(2): 107 121, 2008. 

[68] G. Pages and A. Sellami. Convergence of multi-dimensional quantized SDEs. In Seminaire 
de probabilites XLIII, pages 269-307. Springer, 2011. 

[69] G. Pap. Central limit theorems on nilpotent Lie groups. Probab. Math. Stat, 14(2):287-312, 
1993. 

[70] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical recipes 3rd 
edition: The art of scientific computing. Cambridge LIniversity Press, 2007. 

[71] S. Rachev and L. Ruschendorf. Mass transportation problems, I and II: theory and appli¬ 
cations, 1998. 

[72] C. Reutenauer. Free Lie algebras, volume 7 of London Mathematical Society Monographs. 
New Series, 1993. 

[73] S. Riedel. Talagrand’s transportation-cost inequality and applications to (rough) path 
spaces. arXiv preprint arXiv:1403.2585, 2014. 


43 



[74] S. Riedel and W. Xu. A simple proof of distance bounds for Gaussian rough paths. Electron. 
J. Probab, 18(108):l-22, 2013. 

[75] T. Ryden and M. Wiktorsson. On the simulation of iterated Ito integrals. Stochastic 
processes and their applications, 91(1):151—168, 2001. 

[76] E.M. Sipilainen. Pathwise view on solutions of stochastic differential equations. Ph.D 
Thesis, University of Edinburgh, 1993. 

[77] E.M. Stein. Singular integrals and differentiability properties of functions, volume 2. 
Princeton University Press, 1970. 

[78] R.S. Strichartz. The Campbcll-Baker-Hausdorff-Dynkin formula and solutions of differen¬ 
tial equations. Journal of Functional Analysis, 72(2):320-345, 1987. 

[79] D. Talay. Stochastic Hamiltonian systems: exponential convergence to the invariant mea¬ 
sure, and discretization by the implicit Euler scheme. Markov Process. Related Fields, 
8(2): 163—198, 2002. 

[80] D. Talay and L. Tubaro. Expansion of the global error for numerical schemes solving 
stochastic differential equations. Stochastic analysis and applications, 8(4):483-509, 1990. 

[81] L.N. Vaserstein. Markov processes over denumerable products of spaces describing large 
system of automata (Russian). Problemy Peredaci Informacii, 5:64-72, 1969. 

[82] C. Villani. Topics in Optimal Transportation. Number 58. American Mathematical Soc., 
2003. 

[83] M. Wiktorsson. Joint characteristic function and simultaneous simulation of iterated Ito 
integrals for multiple independent Brownian motions. Annals of Applied Probability, pages 
470-487, 2001. 

[84] L.C. Young. An inequality of the Holder type, connected with Stieltjes integration. Acta 
Mathematica, 67(1):251-282, 1936. 

[85] A.Y. Zaitsev. Multidimensional version of a result of Sakhanenko in the invariance principle 
for vectors with finite exponential moments. I,II,III. Teor. Veroyatnost. i Primenen 45 
(2000), 718-738; 46 (2001), 535-561 and 744-769. Translations in Theory Probab. Appl. 45 
(2002), 626-641; 46 (2003), 490-514 and 676-698. 

[86] A.Y. Zaitsev. Estimates for the quantiles of smooth conditional distributions and the mul¬ 
tidimensional invariance principle. Siberian Mathematical Journal, 37(4):706-729, 1996. 

[87] A.Y. Zaitsev. Multidimensional version of the results of Komlos, Major and Tusnady for 
vectors with finite exponential moments. ESAIM: Probability and Statistics, 2:41-108, 
1998. 


44 



